tornadoweb / tornado

Tornado is a Python web framework and asynchronous networking library, originally developed at FriendFeed.
http://www.tornadoweb.org/
Apache License 2.0
21.75k stars 5.5k forks source link

`Subprocess.wait_for_exit` never resolves if process terminated before it is called #3364

Open meffmadd opened 8 months ago

meffmadd commented 8 months ago

The wait_for_exit method eventually calls os.waitpid, which throws a ChildProcessError if no process with the specified pid exists. This exception is caught, the function just returns, and the Future never resolves.

if __name__ == '__main__':
    import asyncio
    from tornado.process import Subprocess

    async def f():
        p = Subprocess("ls")
        p.proc.wait()
        return await p.wait_for_exit()

    loop = asyncio.get_event_loop()
    loop.run_until_complete(f())

https://github.com/tornadoweb/tornado/blob/65a9e48f8ce645f104e3e0aa772222e70b0376d9/tornado/process.py#L348

Instead, the process could be retrieved from the _waiting dict and the return code could be accessed from the object directly.

bdarnell commented 8 months ago

As long as Tornado is the only thing that touches the SIGCHLD handler, the child process is guaranteed to exist (in a zombie state) until os.waitpid (or another wait function) is called on it once. Could something else be installing a SIGCHLD handler, ignoring SIGCHLD, or calling os.wait directly? (In this example it seems wrong to use both p.proc.wait() and p.wait_for_exit() but if you've seen it in the wild I doubt it's that simple)

meffmadd commented 8 months ago

Ok, interesting! As far as I understand, nothing else is happening. We were creating the subprocess (calling git init) and immediately called wait_for_exit on the following line.

However, we used the Subprocess class outside the tornado process (in a Celery worker instance). Could this have anything to do with it? We have now switched to Popen and wait, which works as intended. The docs don't mention anything about this.

bdarnell commented 5 months ago

However, we used the Subprocess class outside the tornado process (in a Celery worker instance).

I'm confused (maybe because I've never used celery) - there's still a tornado IOLoop (or asyncio event loop) running in the worker process, right? If not you wouldn't be able to await p.wait_for_exit().

Celery does touch the SIGCLD handler although I'm not sure if it's possible for this to be run in a way that would clobber tornado's SIGCLD handler. It doesn't appear to call asyncio's set_child_watcher function but if something in the stack does, it could also cause this problem.