Closed anentropic closed 9 years ago
Good question! This is going to be a bit long, so bear with me (I know you asked for brief, sorry about that :x).
First, let's talk a little bit about Docker. When you run a Docker container, Docker proceeds to isolate it from the rest of the system. That isolation happens at different levels (e.g. network, filesystem, processes).
Tini isn't really concerned with the network or the filesystem, so let's focus on what matters in the context of Tini: processes.
Each Docker container is a PID namespace, which means that the processes in your container are isolated from other processes on your host. A PID namespace is a tree, which starts at PID 1, which is commonly called init
.
Note: when you run a Docker container, PID 1 is whatever you set as your ENTRYPOINT
(or if you don't have one, then it's either your shell or another program, depending on the format of your CMD
).
Now, unlike other processes, PID 1 has a unique responsibility, which is to reap zombie processes.
Zombie processes are processes that:
wait
ed on by their parent process (wait
is the syscall parent processes use to retrieve the exit code of their children).wait
ed on by their parent.When a zombie is created (i.e. which happens when its parent exits, and therefore all chances of it ever being wait
ed by it are gone), it is reparent to init
, which is expected to reap it (which means calling wait
on it).
In other words, someone has to clean up after "irresponsible" parents that leave their children un-wait
'ed, and that's PID 1's job.
That's what Tini does, and is something the JVM (which is what runs when you do exec java ...
) does not do, which his why you don't want to run Jenkins as PID 1.
Note that creating zombies is usually frowned upon in the first place (i.e. ideally you should be fixing your code so it doesn't create zombies), but for something like Jenkins, they're unavoidable: since Jenkins usually runs code that isn't written by the Jenkins maintainers (i.e. your build scripts), they can't "fix the code".
This is why Jenkins uses Tini: to clean up after build scripts that create zombies.
Now, Bash actually does the same thing (reaping zombies), so you're probably wondering: why not use Bash as PID 1?
One problem is, if you run Bash as PID 1, then all signals you send to your Docker container (e.g. using docker stop
or docker kill
) end up sent to Bash, which does not forward them anywhere (unless you code it yourself). In other words, if you use Bash to run Jenkins, and then run docker stop
, then Jenkins will never see the stop command!
Tini fixes by "forwarding signals": if you send a signal to Tini, then it sends that same signal to your child process (Jenkins in your case).
A second problem is that once your process has exited, Bash will proceed to exit as well. If you're not being careful, Bash might exit with exit code 0, whereas your process actually crashed (0 means "all fine"; this would cause Docker restart policies to not do what you expect). What you actually want is for Bash to return the same exit code your process had.
Note that you can address this by creating signal handlers in Bash to actually do the forwarding, and returning a proper exit code. On the other hand that's more work, whereas adding Tini is a few lines in your Dockerfile.
Now, there would be another solution, which would be to add e.g. another thread in Jenkins to reap zombies, and run Jenkins as PID 1.
This isn't ideal either, for two reasons:
First, if Jenkins runs as PID 1, then it's difficult to differentiate between process that were re-parented to Jenkins (which should be reaped), and processes that were spawned by Jenkins (which shouldn't, because there's other code that's already expecting to wait
them). I'm sure you could solve that in code, but again: why write it when you can just drop Tini in?
Second, if Jenkins runs as PID 1, then it may not receive the signals you send it!
That's a subtlety in PID 1. Unlike other unlike processes, PID 1 does not have default signal handlers, which means that if Jenkins hasn't explicitly installed a signal handler for SIGTERM
, then that signal is going to be discarded when it's sent (whereas the default behavior would have been to terminate the process).
Tini does install explicit signal handlers (to forward them, incidentally), so those signals no longer get dropped. Instead, they're sent to Jenkins, which is not running as PID 1 (Tini is), and therefore has default signal handlers (note: this is not the reason why Jenkins uses Tini, they use it for signal reaping, but it was used in the RabbitMQ image for that reason).
Note that there are also a few extras in Tini, which would be harder to reproduce in Bash or Java (e.g. Tini can register as a subreaper so it doesn't actually need to run as PID 1 to do its zombie-reaping job), but those are mostly useful for specialist use cases.
Hope this helps!
Here are some references you might be interested in to learn more about that topic:
Finally, do note that there are alternatives to Tini (like Phusion's base image).
Tini differentiates with:
Cheers,
As to whether you should be using Tini.
Obviously, it's not always needed (e.g. I run http://apt-browse.org/ in a dozen Docker containers, and only one of them uses Tini), but here are a few heuristics:
exec
in your entrypoint registering signal handlers? A good way to figure this out might be to check whether your process responds properly to e.g. docker stop
(or if it waits for 10 seconds before exiting)Now, Tini is transparent, so if you're unsure, adding it shouldn't hurt.
Cheers,
thanks for the detailed info!
one thing I need to clarify... I understood that when I exec "$@"
it 'replaces' the current bash script with the executed file... in that situation do I have to worry about bash not forwarding signals? or bash is out of the picture then, and I just have to ensure that the exec'ed script responds to signals?
When you run exec
, then bash is out of the picture, so forwarding signals isn't needed, however, you need to to ensure that what you exec
'ed (which is presumably now running as PID 1):
Cheers,
(Just closing this since it's not really an issue, but just let me know if you have follow up questions)
no prob, thanks!
maybe put the two links on the readme?
https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/ https://github.com/docker-library/official-images#init
The long explanation/answer might be a good extension to the short chapter in the main README.
You need to put this explanation in the README file. People shouldn't have to search the issues to figure out what the tool does.
Thanks @cmeury @waleedka! I'm adding a short summary in the README and a link to this issue in there: https://github.com/krallin/tini/pull/70
Hi
Can i start supervisord with tini ? Is there a sense to do it ?
What thé difference between s6 ? http://skarnet.org/software/s6/
Hi @krallin, i'm currently trying to reproduce what you said about bash.
You wrote:
One problem is, if you run Bash as PID 1, then all signals you send to your Docker container (e.g. using docker stop or docker kill) end up sent to Bash, which does not forward them anywhere (unless you code it yourself).
To verify that i wrote a small c program:
#include <stdio.h>
#include <signal.h>
#include <stdlib.h>
static void sighandler(int signo){
printf("C Signal handler received signal %d!\n", signo);
printf("Terminating now =)\n");
exit(EXIT_SUCCESS);
}
int main(int argc, char** argv){
signal(SIGTERM, sighandler);
signal(SIGINT, sighandler);
printf("Hello World!\n");
while(1){ ;; }
return 0;
}
I tried the following startup commands:
docker run -ti -v $(pwd):/mnt/host --entrypoint /mnt/host/a.out ubuntu
docker run -ti --init -v $(pwd):/mnt/host ubuntu bash -c /mnt/host/a.out
docker run -ti --init -v $(pwd):/mnt/host ubuntu /mnt/host/a.out
docker run -ti --init -v $(pwd):/mnt/host ubuntu bash -c /mnt/host/a.out
docker run -ti -v $(pwd):/mnt/host ubuntu bash -c /mnt/host/a.out
docker run -d -v $(pwd):/mnt/host ubuntu bash -c /mnt/host/a.out
For each run, with --init and without --init the signal handler is being executed correctly, when using docker stop container. Does the current bash version forward all signals by default when running a command with -c
or am i doing something wrong here?
Thanks =)
Additional Info:
Running on gentoo stable, docker version 17.03.2-ce
@fbe,
When you run bash -c
with a single command, bash actually exec
's the command, so your command ends up running as PID 1 (not bash).
Compare:
thomas@mocha4 ~ % docker run --rm -it ubuntu bash -c 'ps auxww'
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 25976 1480 pts/0 Rs+ 13:37 0:00 ps auxww
thomas@mocha4 ~ % docker run --rm -it ubuntu bash -c 'true && ps auxww'
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 2.0 0.0 18028 2708 pts/0 Ss+ 13:37 0:00 bash -c true && ps auxww
root 7 0.0 0.0 34424 2816 pts/0 R+ 13:37 0:00 ps auxww
(this means the zombie reaping from bash is gone, obviously, since you are no longer using bash as PID 1).
Lesson learned, thank you =)
@krallin, if we run multiple services inside a container, does it make sense to combine tini
with runit
or s6
?
https://github.com/phusion/baseimage-docker/issues/164#issuecomment-62361316
BTW, between runit
and s6
which one will you choose? 😄
@dio You should include Tini if your PID 1 does not handle zombie reaping. I believe runit
does, but I'm not certain about s6
. That said, usually, running Tini in front of anything shouldn't break things (but note some edge case where the process you are calling checks whether it is running as PID 1 or not: #102).
Unfortunately, I don't have enough direct experience with runit
or s6
myself to recommend either of them (I usually use Supervisor when I need an ad-hoc init system in Docker images — note that Supervisor does not do zombie reaping so you would indeed need to use Tini with it).
Hi, Thomas @krallin!
Does Tini pass signals to "properly daemonized" processes? To those that do double forking to detach from parent process and become subprocesses of an init process, and also become leaders of their own process groups?
I try to run php-fpm and nginx daemonized in a single container (I know it doesn't sound good, but I have reasons for that), and when I run
the container in the foreground, it doesn't stop on ^C
.
I tried with --init docker option first, then I tried -g option
ENV TINI_VERSION v0.16.1
ADD https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini-static /tini
RUN chmod +x /tini
ENTRYPOINT ["/tini", "-g", "--"]
Still it doesn't stop daemons and doesn't exit.
I'm not sure how nginx and php-fpm are daemonized, that "double forking" idea above is just a guess.
@mvasin
No - assuming a process is not the direct descendant of Tini and that it uses a different process group, Tini will not deliver signals to it.
Generally speaking, running daemons in your Docker container is a bit of an anti-pattern. Assuming you absolutely need to run multiple processes in a Docker container, I recommend using Tini to start a supervisor (supervisord is a good choice), then using said supervisor to run all other processes in the foreground.
Cheers,
@krallin Thanks for the answer!
BTW, do zombie processes disappear (release PIDs and resources) after stopping a zombie-producing container (in case I'm not using a reaper)?
Regarding packing two services in a container:
I want to bake my code in a container, but it has to be accessible to both nginx and php-fpm. So I'm facing the choice of either moving the code to a volume and sharing access to it between containers, or packing both nginx and php-fpm in a single container.
In other words, I have to choose between "single container per service" and "bake immutable code into containers and use volumes for state" ideas. I think separating code and state is more important, especially as scaling is not (yet) a concern. Would you agree?
@mvasin, yes zombies will be reaped when a container is destroyed (no sure if it is all re-parented to init or by pid namespace destruction but it doesn't really matter).
Why not just run the same image twice, once with nginx as the command and once with php-fpm. Both would be installed in the image along with your code. As long as you run them together, they would both contain the same set of your php code (and for developers working locally, just mount the local working code into both containers and run them as the same user).
I tried baking my php code into both php-fpm and nginx images, but that's a messy duplication.
Baking the code once into a single "php-fpm + nginx" image and running it twice with the different commands is a very promising idea that I didn't have on my mind. Thanks a lot, @yosifkit !
I recommend using Tini to start a supervisor (supervisord is a good choice)
What is the advantage of this compared with running supervisord as PID 1?
I recommend using Tini to start a supervisor (supervisord is a good choice)
What is the advantage of this compared with running supervisord as PID 1?
Supervisord doesn't reap zombie processes, so if any of the processes you start with Supervisord generates zombie processes, they'll stick around forever. If you add Tini as PID 1 and have Tini start Supervisord, that's no longer a problem.
I cannot test this myself but: On http://blog.dscpl.com.au/2015/12/issues-with-running-as-pid-1-in-docker.html, Graham Dumpleton speaks of supervisord as a viable solution for the reaping problem. https://news.ycombinator.com/item?id=8917584 and https://www.reddit.com/r/Python/comments/5k1875/supervisord_project_python_2_to_3_porting_help/dblrv8s/ also say that supervisord reaps. Supervisord's changelog speaks of two bugs related to reaping of dead children.
I'm not certain I understand what you're trying to achieve here, but anyway, first, it's worth noting that some of the links you posted conflate two unrelated concepts here: Supervisord certainly reaps its own children, but Tini is relevant in the context of processes started by Supervisor generating their own children (grandchildren, from the perspective of Supervisord).
My recollection is that Supervisord didn't reap the latter. That being said, I could certainly be wrong (and this could have changed). Perhaps the truth even lies somewhere in the middle.
Taking a step back, Tini is designed specifically to be non-intrusive, and to ensure you don't leave zombies in your container no matter what (as long as Tini is running as PID 1). In other words, it's designed such that you don't need to figure out for certain whether you need it: if you're unsure, just use it, and the worst-case scenario is that nothing happens.
With regard to Supervisor, I'm unsure (although, as I mentioned, my intuition was that it didn't reap grandchildren, but I could be wrong), which is why I'm recommending using Tini in front of it: if Supervisord doesn't reap, then you gained something, and if it does, you lost nothing.
That said, if you have the time to read through the code to make sure, or if a Supervisord maintainer comes in to clarify that I am in fact wrong, then by all means, use Supervisord without Tini.
I came to this thread because I looked for material on the net that tells me whether I actually need tini. I do need supervisord in my containers. While Tini certainly is a cheap addition to an image, plan A is to keep the image and the resulting container as simple as possible.
It was surprising for me that it was impossible to find a clear answer to my question – supervisord is a frequently used program after all. Given that I don't have time to test it myself, I'll go with Tini + supervisord.
@krallin you said (and not only you):
Have lost their parent (i.e. their parent exited as well), which means they'll never be waited on by their parent.
But wikipedia says:
Zombie processes should not be confused with orphan processes: an orphan process is a process that is still executing, but whose parent has died. When the parent dies, the orphaned child process is adopted by init (process ID 1). When orphan processes die, they do not remain as zombie processes; instead, they are waited on by init. The result is that a process that is both a zombie and an orphan will be reaped automatically.
What's true? Thanks.
The Wikipedia answer mentions:
When orphan processes die, they do not remain as zombie processes; instead, they are waited on by init.
That is only true if you do have an init
process that will wait
(i.e. reap) orphaned zombie processes. If you don't, they'll just stick around as oprhan zombies forever.
Reaping orphaned zombie processes is precisely what Tini does.
@krallin
Why tini? To reap zombies .. Why supervisord? To start and supervise multiple processes ..
So tini->supervisord->foreground processes is right pattern, yea?
AFAIK, s6-overlay should handle both of this
Hmm, OK? I'm really not sure what your point is.
I am trying to use tini in my kubernetes app where I see a lot of defunct processes getting created. My StatefulSet spec file looks as follows:
apiVersion: apps/v1 kind: StatefulSet metadata: name: hadoop-rm labels: app: hadoop-node spec: serviceName: "hadoop-cluster" replicas: 1 selector: matchLabels: app: hadoop-node podManagementPolicy: Parallel template: metadata: labels: app: hadoop-node spec: containers: - name: hadoop-node image: hadoop:5.0.0-REL304 command: ["/usr/bin/tini"] args: [ "-vvv", "--", "/test.sh"]
My test.sh looks like: $cat test.sh#!/bin/bash -x sleep 77777 & ps -aef exit 0
When I deploy my app, container starts as expected but exits immediately after the main child test.sh exits. $ kubectl.sh apply -f test4.yaml statefulset.apps/hadoop-rm created
$ kubectl.sh get pods NAME READY STATUS RESTARTS AGE hadoop-rm-0 0/1 CrashLoopBackOff 1 3s
$ kubectl.sh logs hadoop-rm-0
I was expecting that tini will wait till the background process(sleep) is running and reaps it before exiting. But that is not happening. Why does tini say No child to reap ? Shouldn't sleep be now parented by tini (pid 1) ?
How do I make tini wait for grand-children and great-grand-children to exit ?
Tini cannot wait on grandchildren, there's no API to do that in POSIX / Linux (and if there were, it'd be a problem, because Tini cannot know a priori whether its child intend on waiting on its grandchild).
Tini can only wait on processes that have been orphaned — if its child is alive, then its grandchildren aren't orphans.
@krallin If I'm running my docker in a k8s cluster, do I still need tini/dumb-init in my containers. Since, pause container already does zombie reaping ?
@Abhishek627 I'm not certain I understand what you mean by "pause container" ?
@krallin Pause container refers to this: https://www.ianlewis.org/en/almighty-pause-container https://stackoverflow.com/questions/48651269/what-are-the-pause-containers
In case of high-throughput piping to a container (in my case, gigabytes of data, as fast as possible), is tini still 100% transparent?
@krallin I keep coming back to this description of tini every few years... Well written!
Write very good, thank the author's patience, benefit a lot
Hi @krallin. Very nice utility thanks.
There is another usecase that might be worth documenting in a docker context: Docker logging system is through STDOUT
or STDERR
of the PID 1 process. Hence any other process running in the container can log to /proc/1/fd/1
or /proc/1/fd/2
-- so long as these both link to the original Docker created pipes. However, if the PID process has reopened either as a file (for example if httpd
has configured the access or error log to file) then this link is broken. Using a tini wrapper in this case retains the access to these FDs for negligible overhead memory or process time (since signals are rarely issued to the PID 1 process anyway).
good
thomas@mocha4 ~ % docker run --rm -it ubuntu bash -c 'true && ps auxww' USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 2.0 0.0 18028 2708 pts/0 Ss+ 13:37 0:00 bash -c true && ps auxww root 7 0.0 0.0 34424 2816 pts/0 R+ 13:37 0:00 ps auxww
https://github.com/krallin/tini/issues/8#issuecomment-341705587
Adding another command to the list (bash -c 'ps auxww'
-> bash -c 'true && ps auxww'
) doesn't suppress fork
since 4.4
, but this does: bash -c 'true; ps auxww'
. The supposedly relevant lines from the changelog:
execute_cmd.c
- execute_command_internal: AND_AND, OR_OR: call should_suppress_fork for the RHS of && and ||, make `make' invocations marginally more efficient
And that (bash -c 'true; ps auxww'
) no longer suppresses fork
since 5.1
. The supposedly relevant lines from the changelog:
b. Bash attempts to optimize the number of times it forks when executing commands in subshells and from `bash -c'.
--
PID 1 does not have default signal handlers
https://github.com/krallin/tini/issues/8#issuecomment-146135930
I can confirm that.
I guess I've figured it out how to determine if a program needs --init
. To quote the gist:
With
ruby
andgo
programs you probably won't have problems. If a program doesn't set aSIGTERM
handler, then the cleanup is apparently not needed, the language runtime will set a handler and it'll terminate onSIGTERM
. Otherwise the program'sSIGTERM
handler will terminate it.In languages where it's not the case to determine if you need
--init
run the program in a container, make sure it's running under PID 1 and sendSIGTERM
to it. If it terminates, then--init
apparently is not needed.
I'm leaving aside the issue with zombie processes here.
Hi,
| noticed the official Jenkins image was using Tini, so I was curious what it is.
It looks like it must be useful and probably solves some issues I don't know about. Could you explain very briefly in a 'Linux for dummies' kind of way what is the advantage of Tini vs just running a shell script directly as the CMD?
I have a few containers with a
docker-entrypoint.sh
type of script that basically do anexec "$@"
at the end - should I be using Tini instead?