krallin / tini

A tiny but valid `init` for containers
MIT License
9.73k stars 506 forks source link

What is advantage of Tini? #8

Closed anentropic closed 8 years ago

anentropic commented 8 years ago

Hi,

| noticed the official Jenkins image was using Tini, so I was curious what it is.

It looks like it must be useful and probably solves some issues I don't know about. Could you explain very briefly in a 'Linux for dummies' kind of way what is the advantage of Tini vs just running a shell script directly as the CMD?

I have a few containers with a docker-entrypoint.sh type of script that basically do an exec "$@" at the end - should I be using Tini instead?

krallin commented 8 years ago

Good question! This is going to be a bit long, so bear with me (I know you asked for brief, sorry about that :x).

First, let's talk a little bit about Docker. When you run a Docker container, Docker proceeds to isolate it from the rest of the system. That isolation happens at different levels (e.g. network, filesystem, processes).

Tini isn't really concerned with the network or the filesystem, so let's focus on what matters in the context of Tini: processes.

Each Docker container is a PID namespace, which means that the processes in your container are isolated from other processes on your host. A PID namespace is a tree, which starts at PID 1, which is commonly called init.

Note: when you run a Docker container, PID 1 is whatever you set as your ENTRYPOINT (or if you don't have one, then it's either your shell or another program, depending on the format of your CMD).

Now, unlike other processes, PID 1 has a unique responsibility, which is to reap zombie processes.

Zombie processes are processes that:

When a zombie is created (i.e. which happens when its parent exits, and therefore all chances of it ever being waited by it are gone), it is reparent to init, which is expected to reap it (which means calling wait on it).

In other words, someone has to clean up after "irresponsible" parents that leave their children un-wait'ed, and that's PID 1's job.

That's what Tini does, and is something the JVM (which is what runs when you do exec java ...) does not do, which his why you don't want to run Jenkins as PID 1.

Note that creating zombies is usually frowned upon in the first place (i.e. ideally you should be fixing your code so it doesn't create zombies), but for something like Jenkins, they're unavoidable: since Jenkins usually runs code that isn't written by the Jenkins maintainers (i.e. your build scripts), they can't "fix the code".

This is why Jenkins uses Tini: to clean up after build scripts that create zombies.


Now, Bash actually does the same thing (reaping zombies), so you're probably wondering: why not use Bash as PID 1?

One problem is, if you run Bash as PID 1, then all signals you send to your Docker container (e.g. using docker stop or docker kill) end up sent to Bash, which does not forward them anywhere (unless you code it yourself). In other words, if you use Bash to run Jenkins, and then run docker stop, then Jenkins will never see the stop command!

Tini fixes by "forwarding signals": if you send a signal to Tini, then it sends that same signal to your child process (Jenkins in your case).

A second problem is that once your process has exited, Bash will proceed to exit as well. If you're not being careful, Bash might exit with exit code 0, whereas your process actually crashed (0 means "all fine"; this would cause Docker restart policies to not do what you expect). What you actually want is for Bash to return the same exit code your process had.

Note that you can address this by creating signal handlers in Bash to actually do the forwarding, and returning a proper exit code. On the other hand that's more work, whereas adding Tini is a few lines in your Dockerfile.


Now, there would be another solution, which would be to add e.g. another thread in Jenkins to reap zombies, and run Jenkins as PID 1.

This isn't ideal either, for two reasons:

First, if Jenkins runs as PID 1, then it's difficult to differentiate between process that were re-parented to Jenkins (which should be reaped), and processes that were spawned by Jenkins (which shouldn't, because there's other code that's already expecting to wait them). I'm sure you could solve that in code, but again: why write it when you can just drop Tini in?

Second, if Jenkins runs as PID 1, then it may not receive the signals you send it!

That's a subtlety in PID 1. Unlike other unlike processes, PID 1 does not have default signal handlers, which means that if Jenkins hasn't explicitly installed a signal handler for SIGTERM, then that signal is going to be discarded when it's sent (whereas the default behavior would have been to terminate the process).

Tini does install explicit signal handlers (to forward them, incidentally), so those signals no longer get dropped. Instead, they're sent to Jenkins, which is not running as PID 1 (Tini is), and therefore has default signal handlers (note: this is not the reason why Jenkins uses Tini, they use it for signal reaping, but it was used in the RabbitMQ image for that reason).


Note that there are also a few extras in Tini, which would be harder to reproduce in Bash or Java (e.g. Tini can register as a subreaper so it doesn't actually need to run as PID 1 to do its zombie-reaping job), but those are mostly useful for specialist use cases.

Hope this helps!

Here are some references you might be interested in to learn more about that topic:

Finally, do note that there are alternatives to Tini (like Phusion's base image).

Tini differentiates with:

Cheers,

krallin commented 8 years ago

As to whether you should be using Tini.

Obviously, it's not always needed (e.g. I run http://apt-browse.org/ in a dozen Docker containers, and only one of them uses Tini), but here are a few heuristics:

Now, Tini is transparent, so if you're unsure, adding it shouldn't hurt.

Cheers,

anentropic commented 8 years ago

thanks for the detailed info!

one thing I need to clarify... I understood that when I exec "$@" it 'replaces' the current bash script with the executed file... in that situation do I have to worry about bash not forwarding signals? or bash is out of the picture then, and I just have to ensure that the exec'ed script responds to signals?

krallin commented 8 years ago

When you run exec, then bash is out of the picture, so forwarding signals isn't needed, however, you need to to ensure that what you exec'ed (which is presumably now running as PID 1):

Cheers,

krallin commented 8 years ago

(Just closing this since it's not really an issue, but just let me know if you have follow up questions)

anentropic commented 8 years ago

no prob, thanks!

maybe put the two links on the readme?

https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/ https://github.com/docker-library/official-images#init

cmeury commented 8 years ago

The long explanation/answer might be a good extension to the short chapter in the main README.

waleedka commented 7 years ago

You need to put this explanation in the README file. People shouldn't have to search the issues to figure out what the tool does.

krallin commented 7 years ago

Thanks @cmeury @waleedka! I'm adding a short summary in the README and a link to this issue in there: https://github.com/krallin/tini/pull/70

oroc95 commented 7 years ago

Hi

Can i start supervisord with tini ? Is there a sense to do it ?

What thé difference between s6 ? http://skarnet.org/software/s6/

ghost commented 6 years ago

Hi @krallin, i'm currently trying to reproduce what you said about bash.

You wrote:

One problem is, if you run Bash as PID 1, then all signals you send to your Docker container (e.g. using docker stop or docker kill) end up sent to Bash, which does not forward them anywhere (unless you code it yourself).

To verify that i wrote a small c program:

#include <stdio.h>
#include <signal.h>
#include <stdlib.h>

static void sighandler(int signo){
  printf("C Signal handler received signal %d!\n", signo);
  printf("Terminating now =)\n");
  exit(EXIT_SUCCESS);
}

int main(int argc, char** argv){
  signal(SIGTERM, sighandler);
  signal(SIGINT, sighandler);
  printf("Hello World!\n");
  while(1){ ;; }
  return 0;
}

I tried the following startup commands:

docker run -ti -v $(pwd):/mnt/host --entrypoint /mnt/host/a.out ubuntu
docker run -ti --init -v $(pwd):/mnt/host ubuntu bash -c /mnt/host/a.out
docker run -ti --init -v $(pwd):/mnt/host ubuntu /mnt/host/a.out
docker run -ti --init -v $(pwd):/mnt/host ubuntu bash -c /mnt/host/a.out
docker run -ti -v $(pwd):/mnt/host ubuntu bash -c /mnt/host/a.out
docker run -d -v $(pwd):/mnt/host ubuntu bash -c /mnt/host/a.out

For each run, with --init and without --init the signal handler is being executed correctly, when using docker stop container. Does the current bash version forward all signals by default when running a command with -c or am i doing something wrong here?

Thanks =)

Additional Info:

Running on gentoo stable, docker version 17.03.2-ce

krallin commented 6 years ago

@fbe,

When you run bash -c with a single command, bash actually exec's the command, so your command ends up running as PID 1 (not bash).

Compare:

thomas@mocha4 ~ % docker run --rm -it ubuntu bash -c 'ps auxww'
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0  25976  1480 pts/0    Rs+  13:37   0:00 ps auxww
thomas@mocha4 ~ % docker run --rm -it ubuntu bash -c 'true && ps auxww'
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  2.0  0.0  18028  2708 pts/0    Ss+  13:37   0:00 bash -c true && ps auxww
root         7  0.0  0.0  34424  2816 pts/0    R+   13:37   0:00 ps auxww

(this means the zombie reaping from bash is gone, obviously, since you are no longer using bash as PID 1).

ghost commented 6 years ago

Lesson learned, thank you =)

dio commented 6 years ago

@krallin, if we run multiple services inside a container, does it make sense to combine tini with runit or s6?

https://github.com/phusion/baseimage-docker/issues/164#issuecomment-62361316

BTW, between runit and s6 which one will you choose? 😄

krallin commented 6 years ago

@dio You should include Tini if your PID 1 does not handle zombie reaping. I believe runit does, but I'm not certain about s6. That said, usually, running Tini in front of anything shouldn't break things (but note some edge case where the process you are calling checks whether it is running as PID 1 or not: #102).

Unfortunately, I don't have enough direct experience with runit or s6 myself to recommend either of them (I usually use Supervisor when I need an ad-hoc init system in Docker images — note that Supervisor does not do zombie reaping so you would indeed need to use Tini with it).

mvasin commented 6 years ago

Hi, Thomas @krallin!

Does Tini pass signals to "properly daemonized" processes? To those that do double forking to detach from parent process and become subprocesses of an init process, and also become leaders of their own process groups?

I try to run php-fpm and nginx daemonized in a single container (I know it doesn't sound good, but I have reasons for that), and when I run the container in the foreground, it doesn't stop on ^C.

I tried with --init docker option first, then I tried -g option

ENV TINI_VERSION v0.16.1
ADD https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini-static /tini
RUN chmod +x /tini

ENTRYPOINT ["/tini", "-g", "--"]

Still it doesn't stop daemons and doesn't exit.

I'm not sure how nginx and php-fpm are daemonized, that "double forking" idea above is just a guess.

krallin commented 6 years ago

@mvasin

No - assuming a process is not the direct descendant of Tini and that it uses a different process group, Tini will not deliver signals to it.

Generally speaking, running daemons in your Docker container is a bit of an anti-pattern. Assuming you absolutely need to run multiple processes in a Docker container, I recommend using Tini to start a supervisor (supervisord is a good choice), then using said supervisor to run all other processes in the foreground.

Cheers,

mvasin commented 6 years ago

@krallin Thanks for the answer!

BTW, do zombie processes disappear (release PIDs and resources) after stopping a zombie-producing container (in case I'm not using a reaper)?

Regarding packing two services in a container:

I want to bake my code in a container, but it has to be accessible to both nginx and php-fpm. So I'm facing the choice of either moving the code to a volume and sharing access to it between containers, or packing both nginx and php-fpm in a single container.

In other words, I have to choose between "single container per service" and "bake immutable code into containers and use volumes for state" ideas. I think separating code and state is more important, especially as scaling is not (yet) a concern. Would you agree?

yosifkit commented 6 years ago

@mvasin, yes zombies will be reaped when a container is destroyed (no sure if it is all re-parented to init or by pid namespace destruction but it doesn't really matter).

Why not just run the same image twice, once with nginx as the command and once with php-fpm. Both would be installed in the image along with your code. As long as you run them together, they would both contain the same set of your php code (and for developers working locally, just mount the local working code into both containers and run them as the same user).

mvasin commented 6 years ago

I tried baking my php code into both php-fpm and nginx images, but that's a messy duplication.

Baking the code once into a single "php-fpm + nginx" image and running it twice with the different commands is a very promising idea that I didn't have on my mind. Thanks a lot, @yosifkit !

bronger commented 6 years ago

I recommend using Tini to start a supervisor (supervisord is a good choice)

What is the advantage of this compared with running supervisord as PID 1?

krallin commented 6 years ago

I recommend using Tini to start a supervisor (supervisord is a good choice)

What is the advantage of this compared with running supervisord as PID 1?

Supervisord doesn't reap zombie processes, so if any of the processes you start with Supervisord generates zombie processes, they'll stick around forever. If you add Tini as PID 1 and have Tini start Supervisord, that's no longer a problem.

bronger commented 6 years ago

I cannot test this myself but: On http://blog.dscpl.com.au/2015/12/issues-with-running-as-pid-1-in-docker.html, Graham Dumpleton speaks of supervisord as a viable solution for the reaping problem. https://news.ycombinator.com/item?id=8917584 and https://www.reddit.com/r/Python/comments/5k1875/supervisord_project_python_2_to_3_porting_help/dblrv8s/ also say that supervisord reaps. Supervisord's changelog speaks of two bugs related to reaping of dead children.

krallin commented 6 years ago

I'm not certain I understand what you're trying to achieve here, but anyway, first, it's worth noting that some of the links you posted conflate two unrelated concepts here: Supervisord certainly reaps its own children, but Tini is relevant in the context of processes started by Supervisor generating their own children (grandchildren, from the perspective of Supervisord).

My recollection is that Supervisord didn't reap the latter. That being said, I could certainly be wrong (and this could have changed). Perhaps the truth even lies somewhere in the middle.

Taking a step back, Tini is designed specifically to be non-intrusive, and to ensure you don't leave zombies in your container no matter what (as long as Tini is running as PID 1). In other words, it's designed such that you don't need to figure out for certain whether you need it: if you're unsure, just use it, and the worst-case scenario is that nothing happens.

With regard to Supervisor, I'm unsure (although, as I mentioned, my intuition was that it didn't reap grandchildren, but I could be wrong), which is why I'm recommending using Tini in front of it: if Supervisord doesn't reap, then you gained something, and if it does, you lost nothing.

That said, if you have the time to read through the code to make sure, or if a Supervisord maintainer comes in to clarify that I am in fact wrong, then by all means, use Supervisord without Tini.

bronger commented 6 years ago

I came to this thread because I looked for material on the net that tells me whether I actually need tini. I do need supervisord in my containers. While Tini certainly is a cheap addition to an image, plan A is to keep the image and the resulting container as simple as possible.

It was surprising for me that it was impossible to find a clear answer to my question – supervisord is a frequently used program after all. Given that I don't have time to test it myself, I'll go with Tini + supervisord.

ipeacocks commented 5 years ago

@krallin you said (and not only you):

Have lost their parent (i.e. their parent exited as well), which means they'll never be waited on by their parent.

But wikipedia says:

Zombie processes should not be confused with orphan processes: an orphan process is a process that is still executing, but whose parent has died. When the parent dies, the orphaned child process is adopted by init (process ID 1). When orphan processes die, they do not remain as zombie processes; instead, they are waited on by init. The result is that a process that is both a zombie and an orphan will be reaped automatically.

What's true? Thanks.

krallin commented 5 years ago

The Wikipedia answer mentions:

When orphan processes die, they do not remain as zombie processes; instead, they are waited on by init.

That is only true if you do have an init process that will wait (i.e. reap) orphaned zombie processes. If you don't, they'll just stick around as oprhan zombies forever.

Reaping orphaned zombie processes is precisely what Tini does.

lukasmrtvy commented 4 years ago

@krallin

Why tini? To reap zombies .. Why supervisord? To start and supervise multiple processes ..

So tini->supervisord->foreground processes is right pattern, yea?

AFAIK, s6-overlay should handle both of this

krallin commented 4 years ago

Hmm, OK? I'm really not sure what your point is.

rjorapur commented 4 years ago

I am trying to use tini in my kubernetes app where I see a lot of defunct processes getting created.   My StatefulSet  spec file looks as follows:

$ cat test4.yaml

apiVersion: apps/v1 kind: StatefulSet metadata:   name: hadoop-rm   labels:     app:  hadoop-node spec:   serviceName: "hadoop-cluster"   replicas: 1   selector:     matchLabels:       app: hadoop-node   podManagementPolicy: Parallel   template:     metadata:       labels:         app: hadoop-node     spec:       containers:       - name:  hadoop-node         image: hadoop:5.0.0-REL304         command: ["/usr/bin/tini"]         args: [ "-vvv", "--", "/test.sh"]

My test.sh looks like: $cat test.sh#!/bin/bash -x sleep 77777 &    ps -aef exit 0

When I deploy my app,  container starts as expected but exits  immediately  after the main child  test.sh exits. $ kubectl.sh apply  -f test4.yaml statefulset.apps/hadoop-rm created

$ kubectl.sh get pods NAME          READY   STATUS             RESTARTS   AGE hadoop-rm-0   0/1     CrashLoopBackOff   1          3s

$ kubectl.sh logs  hadoop-rm-0

I was expecting that tini will wait till the background process(sleep) is running and reaps it before exiting. But that is not happening. Why does tini say No child to reap ? Shouldn't sleep be now parented by tini (pid 1) ?

How do I make tini wait for grand-children and great-grand-children to exit  ?

krallin commented 4 years ago

Tini cannot wait on grandchildren, there's no API to do that in POSIX / Linux (and if there were, it'd be a problem, because Tini cannot know a priori whether its child intend on waiting on its grandchild).

Tini can only wait on processes that have been orphaned — if its child is alive, then its grandchildren aren't orphans.

Abhishek627 commented 4 years ago

@krallin If I'm running my docker in a k8s cluster, do I still need tini/dumb-init in my containers. Since, pause container already does zombie reaping ?

krallin commented 4 years ago

@Abhishek627 I'm not certain I understand what you mean by "pause container" ?

Abhishek627 commented 4 years ago

@krallin Pause container refers to this: https://www.ianlewis.org/en/almighty-pause-container https://stackoverflow.com/questions/48651269/what-are-the-pause-containers

bronger commented 3 years ago

In case of high-throughput piping to a container (in my case, gigabytes of data, as fast as possible), is tini still 100% transparent?

memark commented 2 years ago

@krallin I keep coming back to this description of tini every few years... Well written!

jaychang9 commented 2 years ago

Write very good, thank the author's patience, benefit a lot

TerryE commented 2 years ago

Hi @krallin. Very nice utility thanks.

There is another usecase that might be worth documenting in a docker context: Docker logging system is through STDOUT or STDERR of the PID 1 process. Hence any other process running in the container can log to /proc/1/fd/1 or /proc/1/fd/2 -- so long as these both link to the original Docker created pipes. However, if the PID process has reopened either as a file (for example if httpd has configured the access or error log to file) then this link is broken. Using a tini wrapper in this case retains the access to these FDs for negligible overhead memory or process time (since signals are rarely issued to the PID 1 process anyway).

JiaoDingjun commented 2 years ago

good

x-yuri commented 1 month ago
thomas@mocha4 ~ % docker run --rm -it ubuntu bash -c 'true && ps auxww'
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  2.0  0.0  18028  2708 pts/0    Ss+  13:37   0:00 bash -c true && ps auxww
root         7  0.0  0.0  34424  2816 pts/0    R+   13:37   0:00 ps auxww

https://github.com/krallin/tini/issues/8#issuecomment-341705587

Adding another command to the list (bash -c 'ps auxww' -> bash -c 'true && ps auxww') doesn't suppress fork since 4.4, but this does: bash -c 'true; ps auxww'. The supposedly relevant lines from the changelog:

execute_cmd.c

  • execute_command_internal: AND_AND, OR_OR: call should_suppress_fork for the RHS of && and ||, make `make' invocations marginally more efficient

And that (bash -c 'true; ps auxww') no longer suppresses fork since 5.1. The supposedly relevant lines from the changelog:

b. Bash attempts to optimize the number of times it forks when executing commands in subshells and from `bash -c'.

--

PID 1 does not have default signal handlers

https://github.com/krallin/tini/issues/8#issuecomment-146135930

I can confirm that.