Open missinglink opened 7 years ago
ref: https://www.ctl.io/developers/blog/post/gracefully-stopping-docker-containers/
explains it better than me :)
docker stop
When you issue a docker stop command Docker will first ask nicely for the process to stop and if it doesn't comply within 10 seconds it will forcibly kill it. If you've ever issued a docker stop and had to wait 10 seconds for the command to return you've seen this in action
The docker stop command attempts to stop a running container first by sending a SIGTERM signal to the root process (PID 1) in the container. If the process hasn't exited within the timeout period a SIGKILL signal will be sent.
Whereas a process can choose to ignore a SIGTERM, a SIGKILL goes straight to the kernel which will terminate the process. The process never even gets to see the signal.
When using docker stop the only thing you can control is the number of seconds that the Docker daemon will wait before sending the SIGKILL:
docker stop ----time=30 foo
@missinglink yeah we've done this for test programs (so that the get killed by SIGALRM) but sadly havent yet made it a habit for the main programs. as a first step we can add this to the main all in one service and essentially copy paste this bit of code wherever its needed. all we need is to do something like:
#include <csignal>
#include <cstdlib>
int main(int argc, char** argv) {
signal(SIGTERM, [](int sig_num)->void{ exit(sig_num); });
return 0;
}
something strange seems to be going on in docker land... so without any change to the code i just wanted to see that indeed our process sticks around if you send SIGTERM
to it...
kkreiser@HP-ZBook-15:~/sandbox/valhalla/valhalla/.libs$ ./valhalla_route_service ../conf.json 1
2017/03/30 16:46:16.921048 [INFO] Tile extract successfully loaded
^C
kkreiser@HP-ZBook-15:~/sandbox/valhalla/valhalla/.libs$ ./valhalla_route_service ../conf.json 1 &
[1] 11599
kkreiser@HP-ZBook-15:~/sandbox/valhalla/valhalla/.libs$ 2017/03/30 16:46:30.327750 [INFO] Tile extract successfully loaded
kkreiser@HP-ZBook-15:~/sandbox/valhalla/valhalla/.libs$ kill -l
1) SIGHUP 2) SIGINT 3) SIGQUIT 4) SIGILL 5) SIGTRAP
6) SIGABRT 7) SIGBUS 8) SIGFPE 9) SIGKILL 10) SIGUSR1
11) SIGSEGV 12) SIGUSR2 13) SIGPIPE 14) SIGALRM 15) SIGTERM
16) SIGSTKFLT 17) SIGCHLD 18) SIGCONT 19) SIGSTOP 20) SIGTSTP
21) SIGTTIN 22) SIGTTOU 23) SIGURG 24) SIGXCPU 25) SIGXFSZ
26) SIGVTALRM 27) SIGPROF 28) SIGWINCH 29) SIGIO 30) SIGPWR
31) SIGSYS 34) SIGRTMIN 35) SIGRTMIN+1 36) SIGRTMIN+2 37) SIGRTMIN+3
38) SIGRTMIN+4 39) SIGRTMIN+5 40) SIGRTMIN+6 41) SIGRTMIN+7 42) SIGRTMIN+8
43) SIGRTMIN+9 44) SIGRTMIN+10 45) SIGRTMIN+11 46) SIGRTMIN+12 47) SIGRTMIN+13
48) SIGRTMIN+14 49) SIGRTMIN+15 50) SIGRTMAX-14 51) SIGRTMAX-13 52) SIGRTMAX-12
53) SIGRTMAX-11 54) SIGRTMAX-10 55) SIGRTMAX-9 56) SIGRTMAX-8 57) SIGRTMAX-7
58) SIGRTMAX-6 59) SIGRTMAX-5 60) SIGRTMAX-4 61) SIGRTMAX-3 62) SIGRTMAX-2
63) SIGRTMAX-1 64) SIGRTMAX
kkreiser@HP-ZBook-15:~/sandbox/valhalla/valhalla/.libs$ kill -15 11599
kkreiser@HP-ZBook-15:~/sandbox/valhalla/valhalla/.libs$
[1]+ Beendet ./valhalla_route_service ../conf.json 1
kkreiser@HP-ZBook-15:~/sandbox/valhalla/valhalla/.libs$
the above would suggest that the program quits when it gets SIGTERM
so i'm at a loss here to whats going on in docker :frowning_face: any ideas?
hmm... I'm not sure what's going on then..
I can confirm that the docker container appears to be hanging around until the timeout is reached (10s) when a docker-compose down
is executed.
few ideas:
SIGTERM
, or should that be instant?./valhalla_route_service
in the foreground and giving it a CTRL+C ?I will investigate further tomorrow and see what's going on, it's possible that I made an error in my configuration which is causing the problem, so I'll double-check that.
this is the relevant section of the Docker container:
CMD valhalla_route_service valhalla.json 1
what does the 1
do there? I copied that off a readme
edit: never mind, I RTFM
//number of workers to use at each stage
auto worker_concurrency = std::thread::hardware_concurrency();
if(argc > 2)
worker_concurrency = std::stoul(argv[2]);
@missinglink it shouldnt take time to wind down, when i ctrl-c it it goes down instantly. i think your third bullet is the most likely culprit here, of course its also the hardest to test. maybe worth writing a small dockerized program just to see what signals are sent when. actually, we could do that with just some bash...
so.. I have officially dived down the rabbit-hole which is docker and came up with only more questions... :)
I set up a container running the server with the CMD
as such:
CMD valhalla_route_service valhalla.json 1
I then created an interactive bash shell inside the running container and ran ps
:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 26 0.0 0.0 19880 3644 ? Ss 16:06 0:00 bash
root 39 0.0 0.0 36088 3204 ? R+ 16:07 0:00 \_ ps auxf
root 1 0.0 0.0 4512 804 ? Ss 16:02 0:00 /bin/sh -c valhalla_route_service /data/valhalla.json 1
root 6 0.0 0.1 1276788 32476 ? Sl 16:02 0:00 valhalla_route_service /data/valhalla.json 1
it seems that when the CMD
is in 'shell form', it is executed with /bin/sh -c
.
sending kill -15 1
had no effect but sending kill -15 6
killed the process and the container exited.
I then changed the CMD
definition to exec form
(ie. array form) as such:
CMD ["valhalla_route_service", "/data/valhalla.json", "1"]
now the binary is executed directly and becomes PID 1
root@18bf075d3d11:/data# ps auxf
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 25 0.3 0.0 19880 3560 ? Ss 16:15 0:00 bash
root 34 0.0 0.0 36088 3184 ? R+ 16:15 0:00 \_ ps auxf
root 1 0.5 0.1 1276788 28496 ? Ssl 16:15 0:00 valhalla_route_service /data/valhalla.json 1
I again opened a bash shell in the running container and ran kill -15 1
, nothing happened!
In either form I find that the container takes >10s to come 'down':
$ time docker-compose down
Stopping valhalla ... done
Removing valhalla ... done
Removing network valhallaissue634_default
real 0m11.280s
user 0m0.340s
sys 0m0.036s
¯\_(ツ)_/¯
docker files: https://github.com/missinglink/valhalla-issue-634
right! so I read this https://www.fpcomplete.com/blog/2016/10/docker-demons-pid1-orphans-zombies-signals.
The reason for this is some Linux kernel magic: the kernel treats a process with PID 1 specially, and does not, by default, kill the process when receiving the SIGTERM or SIGINT signals. This can be very surprising behavior.
the tl;dr is that an explicit signal handler must be defined in any process which could be run as PID1
interesting read: https://github.com/phusion/baseimage-docker/blob/next/README.md
heya, I've been running valhalla using Docker and docker-compose.
when running the
docker-compose down
command aSIGTERM
signal is first sent to the process and then it waits an interval (default 10s) before sending it aSIGKILL
.it seems like
valhalla_route_service
is ignoring theSIGTERM
signal.because of how docker runs the process as PID 1 it behaves a little differently regarding trapping signals, it might require that the signal is explicitly trapped with code.
eg: (in nodejs):
the benefit is that valhalla docker containers would restart almost instantly, rather than waiting ~10s for the SIGKILL.