Closed bryanlatten closed 5 years ago
@bryanlatten Unless I'm mistaken and Docker does something special with ^C, ^C generates a SIGINT, not a SIGHUP...
@skarnet would you expect the signals to pass through the first trap if they are explicitly "handled"? trap -x
seems to not change any behavior also.
Doing a docker stop (term) also has similar behavior.
@bryanlatten Okay, if you send a SIGTERM and trap forwards it, then there's a problem indeed. I'm not a Docker user, so I can't exactly try and reproduce, but can you make a strace -f
of your programs (one with just a trap, another with the two nested traps) and paste the output somewhere? It will help me figure out exactly what's happening.
And just to compare, does a shell script using the trap
builtin work as you expect in the same circumstances?
Hi @skarnet and @bryanlatten,
I tried to collect a strace
for this situation, on CTL+C action from inside of a docker container. Just in case the details matter, I used a vagrant guest ( Ubuntu Bionic ).
vagrant@devtop-bionic:~/dev/s6-signal$ docker --version
Docker version 18.09.5, build e8ff056
I used ubuntu for my dockerfile since i'm more familiar with that over busybox.
FROM ubuntu:bionic
RUN apt-get update; apt-get install -y curl bash git strace
ADD https://github.com/just-containers/s6-overlay/releases/download/v1.21.8.0/s6-overlay-amd64.tar.gz /tmp/
RUN gunzip -c /tmp/s6-overlay-amd64.tar.gz | tar -xf - -C /
ENTRYPOINT ["/init"]
RUN mkdir -p /etc/services.d/myapp
COPY run /etc/services.d/myapp/run
docker build -t bossjones/s6-test:latest .
docker run -i -t --rm --privileged --cap-add=ALL --entrypoint "bash" -v $(pwd):/app:rw -w /app bossjones/s6-test:latest -l
strace -s 8192 -f -o /app/strace.out /init
strace output is attached.
Oh man, @bossjones, if you want a command that does nothing, please use s6-pause
or a long sleep
, not loopwhilex true
! The latter will busyloop, heating your CPU and spewing lots of garbage to the strace. Sorry, but that trace is unreadable to me.
Sorry about that @skarnet ! Didn't know about s6-pause
Hopefully this is much more readable ! Only 225
lines this time.
Thanks @bossjones. So, what's happening, @bryanlatten, is:
s6-nuke -th
, which sends a SIGHUP and a SIGTERM to all the processes at the same time. So, no matter how many trap
s you set, they will all get a SIGHUP and a SIGTERM, and print the message you set, and the end program under the trap
s will also get the signals.s6-nuke
signals too!Conclusion: trap
is working as intended, but the way the container is run is out-of-spec. You can't interrupt s6-svscan with ^C without throwing a serious wrench into everything. And s6-svscan really needs to run as pid 1 in the container.
@bossjones looks like you overrode the entrypoint during your strace to bash
@skarnet really appreciate your help on all of this. To summarize what we know so far:
^C
is translated to SIGINT
by DockerSIGINT
received by s6-svcscan will drag down the whole supervision tree via s6-nuke -th
which sends SIGHUP
and SIGTERM
to all processes, simultaneously. Amounts to a "known behavior" for SIGINT? docker stop
. The whole supervision tree (the s6-svscan and s6-supervise processes) is killed by the SIGINT. That is a known behaviour when you run a supervision tree with a controlling terminal (typically from your command line). In a container, s6-svscan should not have a controlling terminal, it should not be interruptible by SIGINT.
Also, when s6-svscan dies (because of the SIGINT), it doesn't really die (because it's supposed to run as process 1) but instead executes into a shutdown sequence. That shutdown sequence calls s6-nuke -th
which kills all your processes, simultaneously - that is the intended behaviour on docker stop
, but ^C triggers it too early.
I'd need to see a strace of what happens with a docker stop
event instead of a ^C. I'm positive there's also an explanation for the behaviour you're seeing that doesn't involve trap
not working. :-)
Apologies for the delay!
Two more strace
options.
Ran strace directly from docker, then ran docker stop
. I have a feeling this strace is too verbose, but wanted to include options.
strace -s 8192 -f -o ./strace2.out docker run -d -i -t --rm --privileged --cap-add=ALL -v $(pwd):/app:rw -w /app bossjones/s6-test:latest
Started docker with docker run -d -i -t --rm --privileged --cap-add=ALL -v $(pwd):/app:rw -w /app bossjones/s6-test:latest
Entered the container with docker exec, then ran strace -p 1 -s 8192 -f -o /app/strace3.out
.
Thanks @bossjones.
strace2.txt
file, since it straces the calls in the process subtree created by docker stop
, and it doesn't show at all what s6 is doing.The strace3.txt
file, however, confirms what I said: everything is working as expected. @bryanlatten: trap
is only meant to protect your program against a SIGTERM sent to the pid of one supervised process, i.e. if you perform a s6-svc -t /var/run/s6/services/services/myapp
, the trap
command will divert it, and your "myapp" process will not get the SIGTERM. However, trap
cannot protect "myapp" against the container shutdown procedure, which sends a SIGTERM and a SIGHUP, and then a SIGKILL, to all processes in the container at once. It's literally a kill -TERM -1
. There is no diverting that SIGTERM; this is the one you're seeing your "inner trap" receive.
Normally, in the s6-overlay shutdown procedure, running services are stopped first via var/run/s6/etc/cont-finish.d
scripts, likely calling s6-svc -d
on those services. That way, when the final kill arrives, services like "myapp" have already had time to shut down properly (they have at least $S6_SERVICES_GRACETIME
milliseconds to do so). Here, the grace time expires because the services are still running and apparently have not been instructed to exit. So, naturally, the nuke at the end catches them pants down.
Stop your services properly in cont-finish.d
, so they're already down when the container is about to close and slaughters everything at once.
@skarnet super helpful explanation - thank you! I've been trying to find a clearer explanation for the use of cont-finish.d
(task finalization), but can't quite locate one - other than deep conversation in issues. Am I missing something?
Reference: https://github.com/just-containers/s6-overlay/issues/41
Back to the original question: can I intercept an administratively-generated SIGTERM (docker stop) to prep a supervised service for graceful shutdown? It does not appear that cont-finish.d
runs before the distributed SIGTERM
It should. The s6-nuke -th
instruction, which sends the global SIGTERM (and also SIGHUP, because processes such as interactive shells may block or ignore SIGTERM) normally runs after the cont-finish.d
scripts. It's been a long time since I've been directly involved with the s6-overlay structure itself (which is honestly quite old and would benefit from a serious makeover, but the current one works and should do the right thing); so at this point I prefer to defer to @glerchundi and @jprjr.
@skarnet maybe that is the disconnect. I am clearly seeing cont-finish.d
execute after the supervised tasks get their SIGTERM. In this example, "service trapped term" is clearly visible before "cont-finish.d" which are echo'd from their respective places.
[services.d] starting services
[services.d] done.
<><><> service trapped term <><><>
[cont-finish.d] executing container finish scripts...
[cont-finish.d] abc.sh: executing...
<><><> cont-finish.d <><><>
[cont-finish.d] abc.sh: exited 0.
[cont-finish.d] done.
[s6-finish] syncing disks.
[s6-finish] sending all processes the TERM signal.
<><><> service trapped hup <><><>
<><><> service trapped term <><><>
[s6-finish] sending all processes the KILL signal and exiting.
@glerchundi @jprjr any ideas on what I'm doing wrong?
@bryanlatten When you print <><><> service trapped term <><><>
, does it mean that the trap
process (the "outer" one) is receiving a SIGTERM, or your application itself (or an "inner" trap), normally protected by a trap
command, is still receiving a SIGTERM?
@skarnet ah, that's the key. combining the two techniques (cont-finish.d
+ run
-line trap
) yields the desired behavior:
[services.d] done.
^C<><><> [outer] term <><><>
[cont-finish.d] executing container finish scripts...
[cont-finish.d] shutdown.sh: executing...
<><><> cont-finish.d <><><>
[cont-finish.d] shutdown.sh: exited 0.
[cont-finish.d] done.
[s6-finish] syncing disks.
[s6-finish] sending all processes the TERM signal.
<><><> [outer] term <><><>
<><><> [inner] term <><><>
Inner TERM is seem after the cont-finish.d
line, meaning we were able to trap the initial TERM broadcast!
Again, thank you for all the help. On closing, can we add a line in the s6 docs around cont-finish.d
usage?
@bryanlatten cont-finish.d
is a policy specific to s6-overlay, a way to provide user-configurable container initialization and shutdown. It has nothing to do with s6 itself, which only provides mechanism.
If anything, you could ask @glerchundi to clarify the s6-overlay documentation about cont-finish.d
. :-)
Hi @skarnet, trying to understand some strange shutdown behavior that is related to other topics (similar to https://github.com/just-containers/s6-overlay/issues/41, https://github.com/just-containers/s6-overlay/issues/141).
Goal:
docker stop,
either by the orchestrator or through admin action. As developer running the container live would use CTRL+C to terminate a container running in foreground.Test case:
trap
andterm
/hup
, catch signals before they reaches the associatedprogram
. Suppress by taking no action on them.Result:
term
andhup
signals propagated intoprogram
, even though based on the trap documentation, it should no longer forward these signals.Repro: To try to isolate the behavior, I created the following hack to "trap" a trap service:
Dockerfile
/etc/services.d/myapp/run
While running under Docker (with -it flags) and giving a CTRL+C (sighup):
Obviously the loop timed out, but you can see the double trap revealed our strange behavior -- the SIGHUP is being passed through to its final destination (both outer and inner logs are present).
Am I misunderstanding the behavior of trap?