Open meffmadd opened 8 months ago
As long as Tornado is the only thing that touches the SIGCHLD handler, the child process is guaranteed to exist (in a zombie state) until os.waitpid (or another wait function) is called on it once. Could something else be installing a SIGCHLD handler, ignoring SIGCHLD, or calling os.wait directly? (In this example it seems wrong to use both p.proc.wait()
and p.wait_for_exit()
but if you've seen it in the wild I doubt it's that simple)
Ok, interesting! As far as I understand, nothing else is happening. We were creating the subprocess (calling git init
) and immediately called wait_for_exit
on the following line.
However, we used the Subprocess
class outside the tornado process (in a Celery worker instance). Could this have anything to do with it? We have now switched to Popen
and wait
, which works as intended. The docs don't mention anything about this.
However, we used the Subprocess class outside the tornado process (in a Celery worker instance).
I'm confused (maybe because I've never used celery) - there's still a tornado IOLoop (or asyncio event loop) running in the worker process, right? If not you wouldn't be able to await p.wait_for_exit()
.
Celery does touch the SIGCLD handler although I'm not sure if it's possible for this to be run in a way that would clobber tornado's SIGCLD handler. It doesn't appear to call asyncio's set_child_watcher
function but if something in the stack does, it could also cause this problem.
The
wait_for_exit
method eventually callsos.waitpid
, which throws aChildProcessError
if no process with the specified pid exists. This exception is caught, the function just returns, and theFuture
never resolves.https://github.com/tornadoweb/tornado/blob/65a9e48f8ce645f104e3e0aa772222e70b0376d9/tornado/process.py#L348
Instead, the process could be retrieved from the
_waiting
dict and the return code could be accessed from the object directly.