ochinchina / supervisord

a go-lang supervisor implementation
MIT License
4k stars 559 forks source link

zombie alert! #60

Open haraldschilly opened 6 years ago

haraldschilly commented 6 years ago

We're running a kubernetes docker container environment and use this go supervisord (as the pid 1 master) to spawn processes. It works well, but when certain processes are terminated/killed, there are zombies left. I'm not an expert on this, but my feeling is that this supervisord isn't properly handling certain system signal events and hence a final cleanup step isn't happening.

e.g. running tmux and exiting via ctrl-d gives this (truncated) process listing

$ ps aux
user           1  0.0  0.0 362736  6052 ?        Ssl  22:56   0:00 /cocalc/bin/supervisord -c /cocalc/supervisor/supervisord.conf
[...]
user         114  0.2  0.0      0     0 ?        Zs   22:56   0:01 [tmux] <defunct>
user         115  0.0  0.0      0     0 ?        Zs   22:56   0:00 [bash] <defunct>

has anyone else experienced this?

version: 1.0.006

williamstein commented 6 years ago

For our application (mentioned by @haraldschilly above), we have solve this problem by using dumb-init as the parent of go-supervisord:

~$ pstree
dumb-init─┬─supervisord─┬─node─┬─4*[{V8 WorkerThread}]
          │             │      └─5*[{node}]
          │             ├─sh───node─┬─node─┬─bash───tmux
          │             │           │      ├─4*[{V8 WorkerThread}]
          │             │           │      └─2*[{node}]
          │             │           ├─4*[{V8 WorkerThread}]
          │             │           └─5*[{node}]
          │             ├─sh───sshd
          │             └─10*[{supervisord}]
          └─tmux─┬─bash
                 └─bash───pstree

It would be nice if go-supervisord also handled the pid 1 problem, like the Python supervisord does, but it's not critical for us.

ochinchina commented 6 years ago

@haraldschilly Can you give more detailed information on the zombie issue? such as a your test supervisord configuration file.

And then I will do some test to fix your issue.

rbeuque74 commented 6 years ago

I would say that launching tmux and bash that requires interractive session will cause them to instantly stop, and then will be collected by supervisord and restarted. Can you check by multiple ps auxf that pids of bash and tmux changes everytime you runs it ?