Closed colinmollenhour closed 2 years ago
^C
or docker stop
is a known issue indeed, it will be fixed in the next version. (Already fixed in the git head, if you want to build s6-overlay from source.)S6_KILL_GRACETIME
variable, which is the number of milliseconds that s6-overlay should wait: set it to something like 100 or 200 if you're certain that all your services are well-behaved and don't leave behind children that might take time to die when sent a SIGTERM.@skarnet regarding 3 - doesn't s6 check whether the process has finished before grace time, and then issue a SIGKILL
before that?
For supervised processes, yes, but that's not what S6_KILL_GRACETIME
is about. (Yes, the name is confusing.) It's about the final kill of all the processes in the container right before the container exits. That includes unsupervised processes, and s6 has no way of knowing whether those finished on SIGTERM or not.
Thanks for the info @skarnet - so just to clarify, if say I have an Apache process that was already in progress when shutdown was issue that would take say 15 more seconds to complete before Apache shuts down gracefully, is the S6_KILL_GRACETIME
going to cause that to get killed?
Testing with S6_KILL_GRACETIME=1
it seems to exit faster but the nginx process still says it stopped gracefully so I think that is a safe configuration? Still not sure why there needs to be a kill gracetime at all if it waits for each service to exit gracefully but I'm probably completely missing something.
The "fatal error" problem existed in 3.0.0.0, but has been fixed in 3.0.0.2. It's working for me with your exact Dockerfile, no fatal error message.
I'm definitely still getting that error message even with 3.0.0.2-2 using the release files from Github. How could that be? Could there be a difference between your local build and the release files?
[+] Building 61.3s (14/14) FINISHED
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 1.12kB 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for docker.io/library/ubuntu:latest 0.3s
=> https://github.com/just-containers/s6-overlay/releases/download/v3.0.0.2-2/s6-overlay-noarch-3.0.0.2-2.tar.xz 0.0s
=> CACHED [1/8] FROM docker.io/library/ubuntu@sha256:669e010b58baf5beb2836b253c1fd5768333f0d1dbcb834f7c07a4dc93f474be 0.0s
=> https://github.com/just-containers/s6-overlay/releases/download/v3.0.0.2-2/s6-overlay-x86_64-3.0.0.2-2.tar.xz 0.0s
=> [2/8] RUN apt-get update && apt-get install -y nginx xz-utils 58.7s
=> [3/8] RUN echo "daemon off;" >> /etc/nginx/nginx.conf 0.4s
=> [4/8] ADD https://github.com/just-containers/s6-overlay/releases/download/v3.0.0.2-2/s6-overlay-noarch-3.0.0.2-2.tar.xz /tmp 0.0s
=> [5/8] RUN tar -C / -Jxpf /tmp/s6-overlay-noarch-3.0.0.2-2.tar.xz 0.5s
=> [6/8] ADD https://github.com/just-containers/s6-overlay/releases/download/v3.0.0.2-2/s6-overlay-x86_64-3.0.0.2-2.tar.xz /tmp 0.0s
=> [7/8] RUN tar -C / -Jxpf /tmp/s6-overlay-x86_64-3.0.0.2-2.tar.xz 0.5s
=> [8/8] RUN mkdir -p /etc/s6-overlay/s6-rc.d/nginx /etc/cont-init.d/ && echo '#!/command/with-contenv sh\necho "Starting $FOO..."\nexec /usr/sbin/nginx' 0.5s
=> exporting to image 0.4s
=> => exporting layers 0.4s
=> => writing image sha256:ce4db5a3a719911423f680552326af9ff0bd182fc9114d64cd47e674c97f69aa 0.0s
=> => naming to docker.io/library/s6demo 0.0s
Use 'docker scan' to run Snyk tests against images to find vulnerabilities and learn how to fix them
s6-rc: info: service nginx: starting
s6-rc: info: service s6rc-oneshot-runner: starting
s6-rc: info: service nginx successfully started
Starting bar...
s6-rc: info: service s6rc-oneshot-runner successfully started
s6-rc: info: service fix-attrs: starting
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service legacy-cont-init: starting
cont-init: info: running /etc/cont-init.d/00-hello
Hello bar
cont-init: info: /etc/cont-init.d/00-hello exited 0
s6-rc: info: service legacy-cont-init successfully started
s6-rc: info: service legacy-services: starting
s6-rc: info: service legacy-services successfully started
^Cs6-rc: info: service legacy-services: stopping
s6-rc: info: service legacy-services successfully stopped
s6-rc: info: service legacy-cont-init: stopping
s6-rc: info: service nginx: stopping
s6-rc: info: service legacy-cont-init successfully stopped
s6-rc: info: service fix-attrs: stopping
s6-rc: info: service fix-attrs successfully stopped
s6-rc: info: service s6rc-oneshot-runner: stopping
s6-rc: info: service nginx successfully stopped
s6-rc: info: service s6rc-oneshot-runner successfully stopped
s6-linux-init-shutdown: fatal: unable to talk to shutdownd: Operation not permitted
S6_KILL_GRACETIME
does not impact supervised services, only processes that are still alive at the end of the container's lifetime. If your nginx is being supervised, it is safe, it will be stopped cleanly. (There is also a timeout after which it gets a SIGKILL if it's not dead yet, but it's 5 seconds by default, you can change it by writing a timeout-kill
file in the service directory if needed, and it has no influence on the global death time.)
If you are only using supervised services that you know don't leave behind unattended and misbehaved children, then you can safely have a very small S6_KILL_GRACETIME
to minimize the waiting time at the end.
The only way you're getting an "unable to talk to shutdownd" error is when your ^C
sends a SIGINT to the supervision tree's process group and kills the whole tree, which should definitely not happen if your s6-overlay has been built with s6-linux-init version 1.0.7.1 or later. I will do more testing before the next release, and make sure it works, but something seems definitely not right.
As a workaround in the meantime, if you kill your containers with docker stop
instead, you should not be getting that error.
Testing with the latest version, which is on track to become 3.1.0.0 (you can check for yourself if you build s6-overlay from source):
s6-linux-init-shutdown: fatal: unable to talk to shutdownd: Operation not permitted
errordocker stop
makes the container exit 0Thanks for the updates, makes sense!
@skarnet
Seems like with 3.1.6.1 release I hit this again:
docker run --rm --pull never \
-e PURE_PASSWDFILE=/tmp/pureftpd.passwd \
-e PURE_DBFILE=/pureftpd.pdb \
-v $(pwd)/tests/resources/pureftpd.passwd:/tmp/pureftpd.passwd:ro \
instrumentisto/pure-ftpd:1.0.51-r20 test -f /pureftpd.pdb
outputs:
s6-rc: info: service syslog: starting
s6-rc: info: service s6rc-oneshot-runner: starting
s6-rc: info: service syslog successfully started
s6-rc: info: service s6rc-oneshot-runner successfully started
Nov 20 12:21:34 7922563421b8 syslog.info syslogd started: BusyBox v1.36.1
s6-rc: info: service fix-attrs: starting
s6-rc: info: service create-puredb: starting
s6-rc: info: service create-puredb successfully started
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service legacy-cont-init: starting
s6-rc: info: service legacy-cont-init successfully started
s6-rc: info: service legacy-services: starting
s6-rc: info: service legacy-services successfully started
s6-rc: info: service legacy-services: stopping
s6-rc: info: service legacy-services successfully stopped
s6-rc: info: service legacy-cont-init: stopping
s6-rc: info: service create-puredb: stopping
s6-rc: info: service syslog: stopping
Nov 20 12:21:34 7922563421b8 syslog.info syslogd exiting
s6-rc: info: service create-puredb successfully stopped
s6-rc: info: service syslog successfully stopped
s6-rc: info: service legacy-cont-init successfully stopped
s6-rc: info: service fix-attrs: stopping
s6-rc: info: service fix-attrs successfully stopped
s6-rc: info: service s6rc-oneshot-runner: stopping
s6-rc: info: service s6rc-oneshot-runner successfully stopped
s6-linux-init-shutdown: fatal: unable to talk to shutdownd: Operation not permitted
s6-linux-init-shutdown: fatal: unable to talk to shutdownd: Operation not permitted
On 3.1.6.0, though, everything is totally OK.
Confirmed. Working on it.
Create a Dockerfile with one or more longrun services:
Build and run:
Press Ctrl+C to kill the container. Run "echo $?" to get the return code.
Expected result:
Quick and clean shutdown with "0" return code.
Actual result
Shutdown displays a "fatal" error and takes about 3-4 seconds, return code is "111".
Comments
Maybe I'm missing something but this is not the result I desire/expect given that my services shut down instantly and cleanly. This extra delay is just unnecessary downtime when recreating containers with docker-compose for example and the return code indicates some sort of error even though there was none.