moby / moby

The Moby Project - a collaborative project for the container ecosystem to assemble container-based systems
https://mobyproject.org/
Apache License 2.0
68.11k stars 18.58k forks source link

Disabling /etc/hosts management #10316

Closed ibukanov closed 8 years ago

ibukanov commented 9 years ago

By experimenting I discovered that in Docker 1.3 I can replace Docker's /etc/hosts management in a container via adding -v .../custom_hosts_for_container:/etc/hosts . This will supply a custom /etc/hosts to the container hiding one provided by default.

Is it supported solution that will continue to work in future? Or is this is something that works just by an accident and could be broken in later versions of Docker?

cyphar commented 9 years ago

This happens because currently we do volume magic after the /etc/hosts magic. I'm fairly sure that you would be wise to not rely on this as being a solution to a problem.

icecrime commented 9 years ago

Can you tell us more about your use case? Some things are guaranteed to be supported in the future, such as adding entries to /etc/hosts using the --add-host command-line flag. The ability to "shadow" the file by bind mounting over it is undocumented, so I wouldn't rely on it.

ibukanov commented 9 years ago

My usercase is a container running sshd that provides a sftp access to a volume and in addition can forwards ports into other containers, like in ssh -L localport:container-name:port. Those containers may or may not be running or existing when the sshd container starts. For example, their image could be rebuilding to account for new updates. As such I cannot use --link.

Also --add-host does not work as that requires to know the ip address in advance. Instead I have a host volume with the hosts file for sshd container. Other containers simply patch that file on startup and then that file is exposed into sshd using -v ../hosts:/etc/hosts hack. Before I discovered this hack I used an extra shell script running in the sshd container that monitored hosts file in a shared volume and copied it over /etc/hosts in the container. As an alternative I considered to run DNS server for sshd, but that just would complicate the whole setup for no gains. Overriding /etc/hosts simplified everything nicely.

I suppose if docker would support reserving/assigning static addresses to containers, then clearly I could use --add-hosts.

temoto commented 9 years ago

I run 3 containers: application, database, reverse proxy. Some of them need to know IP of another to connect. Each of them is updated/restarted independently, so --link doesn't work because it doesn't update /etc/hosts when containers change their IP. Any of the following would solve my issue:

ibukanov commented 9 years ago

@temoto I discovered that by not using --link I exposed my containers to arbitrary connections from unrelated containers as I needed to run the docker with --icc=true (the default). This means if some of those other containers are hacked, my containers will be hacked as well as they do not use any kind of passwords for inter-container communications.

So I stopped using the /etc/hosts trick. Instead I implemented the notion of a local pond similar to the Kubernetes approach. For that I use a special-network only container that main process just sleep forever. Other containers use --net=container:name to join its networking stack. As such the internal services can bind their sockets to the loopback interface and all communications happens over the internal localhost address with no influence from outside world. Also the application containers can be restarted independently with no need to update any files.

temoto commented 9 years ago

@ibukanov that's brilliant! I was going to check out if it's possible to make a new bridge for each group of containers, like in systemd-nspawn, turns out you can't. Still can do it with --net=none + pipework [1].

It works until network-sleep container is restarted. So we can track evolution:

[1] https://github.com/jpetazzo/pipework [2] https://github.com/temoto/docker-must-populate-hosts

ibukanov commented 9 years ago

@temoto why would you need to ever restart the net-only container that executes sleep forever? The only danger is that it could be accidentally killed by, say, out-of-memory killer. But if you use a custom image with the only file which is a statically linked executable that sleep forever while ignoring all the signals, you container will take just few kilobytes of memory and OOM killer will spare it. And if you really-really want to protect against that container restart, then use systemd to manage container dependencies.

temoto commented 9 years ago

@ibukanov oh I'm not saying it's a deal breaker. Your idea is brilliant and it works well enough. OOM is one of many sources of killed container. Another example is system operator. Point being, the design has a weak point for no good value. Correct solution is to not have the weak spot.

Bad analogy police is on its way, but... Suppose, you have an open wire connecting electricity at home. Touch it to get pain and power off the whole house. Argument: so don't touch it. Yes, it works. But there is no benefit in keeping it and avoiding it.

ibukanov commented 9 years ago

@temoto What if the operator accidentally kills the docker daemon or systemd core dumps or kernel crashes ;)

In any case, if it is really necessary to ensure that a dependent container restarts after a restart of the net container, use systemd unit files etc. to manage the lifetime of the containers and their dependencies.

temoto commented 9 years ago

@ibukanov I sincerely hope you got the difference and just mocking. In case I wasn't clear enough: sometimes you need to stop/kill some process, so kill is dangerous but useful. The possibility of network-sleep container being stopped brings only danger and no positive value. Systemd is great and capable of making this design stable. But you have additional costs of configuring it and you have additional penalty of restarting innocent containers just because their dependency restarted.

My point: the network-sleep approach is great and for some tasks it's the best option you could have today. Just let's not stop and make next solution strictly better.

ibukanov commented 9 years ago

@temoto If you really need an absolutely reliable and secure link between containers, I think the best option is try to use unix domain sockets bound to a path in a host volume. Too bad there is a big chunk of server software that can listen only on TCP sockets, like anything written in Java.

Also, perhaps docker would implement #10462 at one point...

poelzi commented 9 years ago

We use docker to run kitchen tests for our chef cookbooks. This may sound like abuse but it actually works quite well and is quite fast and nicely cleans up after itself. We currently run a special recipe at the beginning that unmounts all the /etc/hosts, /etc/resolve.conf,... mountpoints and sets up 2 dummy network interfaces that reflect our physical boxes and then runs the recipes. This has mayor drawbacks, as it requires us to run in privileged mode to have the umount available in the container just to unmount everything we don't need. Otherwise the NET_ADMIN capability would be enough for the docker to setup the dummy interfaces. This option would be very handy for getting rid of privileged mode.

mitar commented 9 years ago

This was the feature used by docker-hosts and it seems that with 1.6.2 this stopped working because the order is now different? Docker hosts is mounted after volume hosts. Is it possible to disable Docker hosts mounting?

thaJeztah commented 8 years ago

@mavenugo @mrjana could you have a look here? Will the use-cases here be covered by the new networking features and discovery mechanisms?

mitar commented 8 years ago

See my comment here how current discovery mechanisms relate to docker-hosts. The main difference is that docker-hosts allow specifying top-level domain as well.

thaJeztah commented 8 years ago

Thanks @mitar! Not (yet) covered by the plans for the new networking then; good to know.

mitar commented 8 years ago

Yes, but great job otherwise! I am glad for this additions to Docker!

mitar commented 8 years ago

It seems that docker-hosts works again with Docker 1.8 and that you can mount a volume over /etc/hosts.

ibukanov commented 8 years ago

I see that in docker 1.8 the trick to disabling /etc/hosts management through a bind mount became more complicated. If I just mount a host file over /etc/hosts using -v host-path:/etc/hosts , then in the container /etc/hosts is mounted read-only. To workaround that I bind-mounted the same host file twice like in:

docker run ... -v  host-path:/etc/hosts -v host-path:/etc/hosts.custom

Then in the container I can write to /etc/hosts.custom and the same content appear in /etc/hosts.

This is my second use-case for a custom /etc/hosts. This time it is for a container that runs a verification script against a test server holding a copy of the production. The script needs to override /etc/hosts so DNS names for the production points to the test server. I cannot use --add-host as the script gets the IP as a part of a potentially dynamic configuration. Also, as I run the script as a non-root user for better isolation, it cannot write to /etc/hosts that Docker bind-mounts as it is owned by root and that cannot be changed.

All this complexity just shows that docker really should provide an option so it never touches /etc/hosts in the container.

cpuguy83 commented 8 years ago

@ibukanov In docker 1.10 Docker does not touch /etc/hosts on user defined networks.

ibukanov commented 8 years ago

@cpuguy83 It is nice to hear. Any plans to extend that option to the default docker bridge? With my latest case the user-defined networks just unnecessary complicate the setup.

cpuguy83 commented 8 years ago

@ibukanov no, I do not think we will be changing behavior here. Keep in mind, in 1.8 there was a bug that caused all containers to be listed in /etc/hosts of every container, which may have caused unnecessary writes and potential corruption. This does not happen anymore.

ibukanov commented 8 years ago

@cpuguy83 Thanks for the update. In any case, it is nice to know I know when this-hack-that-strongly-depends-on-docker-internals stops working I can put my container with test code into own network.

calavera commented 8 years ago

no, I do not think we will be changing behavior here.

Sorry, but as @cpuguy83, we don't have plans to change that behavior. Closing this as "won't fix".