s6-svwait not reaping zombies

dtgriscom commented 3 years ago

Hello, all. I'm using s6 as the init process manager in a Docker container, using s6-overlay Everything's working fine, but I send a SIGINT to the container, the processes being managed exit, but they become zombies and aren't reaped, forcing the system to timeout (twice, actually).

I'm using ubuntu:20.04 as a container using s6-overlay amd64 version 2.2.0.3, which I believe has the latest s6. All runs on an Ubuntu 18.04 desktop system. It looks like s6-svscan sends SIGINT or SIGTERM to the processes, and then uses s6-svwait to wait for the processes to exit, but the zombie processes are never reaped.

I found the following reference that suggests the problem might be a kernel problem: https://github.com/just-containers/s6-overlay/issues/135 , although I'm not seeing the high zombie CPU usage referenced. I also found https://wiki.gentoo.org/wiki/S6 , which suggested that sending a SIGCHLD to s6-svscan, which should cause it to re-scan for zombies, didn't work.

Here are the processes once everything is started (viewed by "ps axl" after running bash in a separate connection to the container):

root@4fa66da81d02:/# ps axl
F   UID   PID  PPID PRI  NI    VSZ   RSS WCHAN  STAT TTY        TIME COMMAND
4     0     1     0  20   0    196     4 poll_s Ss+  pts/0      0:00 s6-svscan -t0 /var/run/s6/services
4     0    35     1  20   0    196     4 poll_s S+   pts/0      0:00 s6-supervise s6-fdholderd
4     0   228     1  20   0    196     4 poll_s S+   pts/0      0:00 s6-supervise thttpd
4     0   229     1  20   0    196     4 poll_s S+   pts/0      0:00 s6-supervise exrouter
4 65534   232   228  30  10 179052 165784 poll_s SNs ?          0:00 /opt/pdm/bin/thttpd -nip -nos -c **.html|**.sh|
4     0   233   229  30  10   6224  1568 poll_s SNs  ?          0:00 /opt/pdm/bin/exrouter-cpp
4     0   247     0  20   0   5996  3756 do_wai Ss   pts/1      0:00 bash
4     0   255   247  20   0   7568  3024 -      R+   pts/1      0:00 ps axl

And, once I issue a SIGINT to the container, but before any timeout:

root@4fa66da81d02:/# ps axl
F   UID   PID  PPID PRI  NI    VSZ   RSS WCHAN  STAT TTY        TIME COMMAND
4     0     1     0  20   0    176     4 do_wai Ss+  pts/0      0:00 foreground  backtick  -D  3000  -n  S6_SERVICES
4 65534   232     1  30  10      0     0 -      ZNs  ?          0:00 [thttpd] <defunct>
4     0   233     1  30  10      0     0 -      ZNs  ?          0:00 [exrouter-cpp] <defunct>
4     0   247     0  20   0   5996  3860 do_wai Ss   pts/1      0:00 bash
0     0   271     1  20   0    176     4 do_wai S+   pts/0      0:00 foreground  s6-svwait  -D  -t  10000  /var/run/
4     0   278   271  20   0    204     8 poll_s S+   pts/0      0:00 s6-svwait -D -t 10000 /var/run/s6/services/thtt
4     0   279   278  20   0    452     4 poll_s S+   pts/0      0:00 s6-ftrigrd
4     0   280   247  20   0   7568  2976 -      R+   pts/1      0:00 ps axl

And, after the system times out and sends SIGTERM to all the processes:

root@4fa66da81d02:/# ps axl
F   UID   PID  PPID PRI  NI    VSZ   RSS WCHAN  STAT TTY        TIME COMMAND
4     0     1     0  20   0    176     4 do_wai Ss+  pts/0      0:00 foreground  backtick  -D  3000  -n  S6_KILL_GRA
4 65534   232     1  30  10      0     0 -      ZNs  ?          0:00 [thttpd] <defunct>
4     0   233     1  30  10      0     0 -      ZNs  ?          0:00 [exrouter-cpp] <defunct>
4     0   279     1  20   0      0     0 -      Z+   pts/0      0:00 [s6-ftrigrd] <defunct>
0     0   285     1  20   0    168     4 poll_s S+   pts/0      0:00 s6-sleep -m -- 10000
4     0   292     0  20   0   5992  3760 do_wai Ss   pts/1      0:00 bash
4     0   300   292  20   0   7568  3080 -      R+   pts/1      0:00 ps axl

Notes:

The managed processes are "thttpd" and "exrouter"
I set S6_SERVICES_GRACETIME and S6_KILL_GRACETIME to 10000 for the above tests
When s6-svscan decides to exit, it sends signals to all the managed processes, and the s6-supervised processes exit but the two managed processes become zombies and aren't reaped
The first timeout still didn't kill thttpd or exrouter (although it did kill bash, so I had to reconnect to gather the third "ps axl"
I tried sending SIGCHLD to process 1 to prompt it to reap zombies, but nothing changed
It looks like process 1 is the foreground command; perhaps it needs to check for and reap zombies?

It would be easy to cut the timeouts to, say, 100ms each, but I'd much rather have a correct shutdown sequence, as that's why I switched to s6 and s6-overlay in the first place.

(FYI, I first posted this on the s6 mailing list, and Laurent suggested I post it here. He also gave some good information which I'll add to this issue as a comment.)

dtgriscom commented 3 years ago

Here is Laurent Bercot's response to my s6 mailing list query:

Hi Daniel,

I'm actually not the maintainer of s6-overlay: John is. I think the correct place to describe your issue is GitHub where s6-overlay is hosted.

I am aware that there is a race condition problem with zombies in the shutdown sequence of s6-overlay. This is not the first time it occurs (at some point broken kernels were also causing similar troubles, but this is probably not what is happening here).

For instance, I know that the line at https://github.com/just-containers/s6-overlay/blob/master/builder/overlay-rootfs/etc/s6/init/init-stage3#L53 is incorrect: s6-svwait cannot run correctly when the supervision tree has been torn down, which is the case in init-stage3. This is why the s6-svwait programs are waiting until they time out: even though the services they're waiting for are down, they're never triggered because the associated s6-supervise processes, which perform the triggers, are already dead.

Unfortunately, fixing this requires a significant rewrite of the s6-overlay shutdown sequence. I have started working on this, but it has been preempted by another project, and will likely not come out before

I'm sorry; I would like to provide the correct shutdown sequence you're looking for (and that is entirely possible to achieve with s6) but as is, we have to make do with the current sequence.

A tweak I would try is replacing the whole foreground block at lines 48-55 with the following: (without a foreground block)

backtick -D 3000 -n S6_SERVICES_GRACETIME { printcontenv S6_SERVICES_GRACETIME } importas -u S6_SERVICES_GRACETIME S6_SERVICES_GRACETIME wait -t ${S6_SERVICES_GRACETIME} { }

This makes it so init-stage3 simply waits for all processes to die before continuing, instead of waiting for a trigger that will never come. It is not a long-term solution though, because having for instance a shell on your container will make the "wait" command block until it times out; but it may be helpful for your situation.

Please open a GitHub issue to discuss this.

skarnet commented 2 years ago

Heads up: the next version of s6-overlay is almost ready and fixes this problem (among others).

skarnet commented 2 years ago

v3.0.0.0 is out (the built tarballs aren't there yet, but the source is available and it's easy to build yourself). It should solve any zombie-related issue. Please reopen an issue if you are still having trouble.

just-containers / s6-overlay

s6-svwait not reaping zombies #350