vladgh / docker_base_images

Vlad's Base Images for Docker
Apache License 2.0
92 stars 41 forks source link

Graceful shutdown for s3sync sync #62

Closed aisensiy closed 6 years ago

aisensiy commented 6 years ago

I am using the vladgh/s3sync as a sidecar in kubernetes pod. But when I send TERM signal to it, it return a non zero code 143 which will make the whole pod show running fail.

So is there some way to make it exit 0 with SIGTERM? I tried using trap in the entrypoint.sh but it is not working.

vladgh commented 6 years ago

Hi @aisensiy, glad to hear you find my image useful. I am not very familiar with exit codes, but after some research this is what I discovered.

TL;DR: This is expected behavior; see below for alternate solutions.

Programs that exit in response to a signal must not simply exit, but instead kill themselves using the signal they trapped. This allows a calling program to determine the reason for the termination.

It's unix convention for a process to have an exit code of 128 + signal when it exits due to a signal

0: Success
125: Docker run itself fails
126: Contained command cannot be invoked
127: Containerd command cannot be found
128 + n: Fatal error signal n:
130: (128+2) Container terminated by Control-C
137: (128+9) Container received a SIGKILL
143: (128+15) Container received a SIGTERM
255: Exit status out of range(-1)

This image uses Tini, which does not make any assumptions about the meaning of the signal it receives and simply forwards it to its child.

In order for your traps to work you need to add the -g flag to Tini in the Dockerfile (https://github.com/krallin/tini#process-group-killing):

ENTRYPOINT ["/sbin/tini", "-g", "--", "/entrypoint.sh"]

An only then you can set a trap at the top of the entrypoint.sh

trap "exit 0" INT TERM EXIT

Another elegant solution would be the new remapping exit codes feature of Tini (https://github.com/krallin/tini#remapping-exit-codes). But this has just been released in the latest version of Tini, but is not currently present in the Alpine Stable packages. It will be sometime soon, when Alpine releases 3.8.

As for this image, for now, I don't think I will change the intended behavior.

Let me know if this answers your question.

aisensiy commented 6 years ago

Thanks for your reply, it is very helpful. Great to learn Tini from you. I missed the Tini part and try a lot of staff which is not working.

By the way, I know the default 128 + 15 is reasonable, but as a long running daemon, inotifywait itself can receive a TERM signal and return code 0. So maybe exit 0 do make sense.

In my scenario I used s3sync as a sidecar to update checkpoints during a long running job. When the job is done I will shut the s3sync container down by sending TERM. I think this is a nice behavior.