intel / cc-oci-runtime

OCI (Open Containers Initiative) compatible runtime for Intel® Architecture
GNU General Public License v2.0
416 stars 59 forks source link

docker swarm: dns resolution fails #854

Open mcastelino opened 7 years ago

mcastelino commented 7 years ago

When running Clear Container based containers in a docker swarm, DNS resolution does not work both for internal and external DNS when the DNS resolution is performed from within the Clear Container.

This is due to the way the DNS resolution is implemented within docker swarm.

DNS Resolution in Swarm

All docker swarm containers have the DNS resolver set to 127.0.0.11:53

Docker swarm has an internal DNS based load balancer that RRs the DNS requests to spread load. That runs on the localhost on the host bound to a host port specific to the container. https://github.com/docker/libnetwork/blob/5ac04367ae7b0b12c33bed5f5b395bd4c104fff9/sandbox.go#L815

There is a iptables rule injected into the container namespace which is used to implement the docker DNS load balancer/resolver. That way 127.0.0.11:53 maps to a specific port on which the corresponding resolver is running.

    -A DOCKER_OUTPUT -d 127.0.0.11/32 -p tcp -m tcp --dport 53 -j DNAT --to-destination 127.0.0.11:41343
    -A DOCKER_OUTPUT -d 127.0.0.11/32 -p udp -m udp --dport 53 -j DNAT --to-destination 127.0.0.11:43411
    -A DOCKER_POSTROUTING -s 127.0.0.11/32 -p tcp -m tcp --sport 41343 -j SNAT --to-source :53
    -A DOCKER_POSTROUTING -s 127.0.0.11/32 -p udp -m udp --sport 43411 -j SNAT --to-source :53

Here the DNS request is NATed to a container specific TCP and UDP port.

The resolver in this case is dockerd

 netstat -plunt
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 127.0.0.11:41343        0.0.0.0:*               LISTEN      14447/dockerd
udp        0      0 127.0.0.11:43411        0.0.0.0:*                           14447/dockerd

In the case of clear containers there is currently no way for the DNS request from within the VM to talk to the dockerd running on the host side. The only host connectivity that the VM has is via the docker_gwbridge. However the DNS resolver running within the network namespace is not reachable via the VM.

Network setup with Clear Containers


+---------------------------------+                +--------------------------------+
|   ingress sbox                  |                |                                |
|                    +            |                |                       +        |
|                    +-----------------------------------------------------+        |
|            I IP    |            |                |                       +--------------+
|                    +----+       |                |           +-----------+        |
|                    +    |       |                |       over|ay box     |        |
|                         |       |                |           |                    |
+---------------------------------+                +--------------------------------+
                          |                                    |
                          |                                    |
                          |                                    |            host container ns
                          |                   +--------------------------------------------+
                          |                   |        +-+     |                           |
                          |                   |        | +-----+     +-----------------+   |
                          |                   |        | |           |    IP           |   |
                          |                   |        | +--------------+ VIP          |   |
                          |         Resolver-----+     +-+           |                 |   |
     docker_gw_bridge     |       127.0.0.11  |                      |                 |   |
               +          |                   |       +-+R IP        |                 |   |
               +----------+                   |       | +---------------+ HIP          |   |
       H GW IP +--------------------------------------+ |            |                 |   |
               |                              |       +-+            +-----------------+   |
               +       default gw             |                    /etc/resolv.conf (127..)|
                                              +--------------------------------------------+

Internal DNS Resolution

Internal DNS resolution is handled completely by dockerd. So dockerd directly responds to the DNS request from the container process for any cluster local resource.

External DNS Resolution

External DNS resolution is not handled by dockerd. When dockerd is unable to resolve the name to a cluster local resource it will then perform a DNS resolution using the host's resolv.conf.

Hence the DNS resolution process for external name is

process -> dockerd:43411  -> host (via the namespace) -> external DNS -> host -> dockerd -> container.

Here you will notice, dockerd sends packets out from within the namespace to the host via the interface bound to the docker_gwbridge.

In the case of clear containers as there is network connectivity between the container network namespace and the host, this request can never be fulfilled.

Work around for External DNS

For external DNS resolution, the resolv.conf can be updated to point to a external DNS resolver. This will ensure that the external DNS resolution works

Fixing this issue in Clear Containers

The long term plan is to proxy the internal DNS requests from within the VM to dockerd. On failure of the DNS resolution the resolution has to be performed from within the VM to the host resolver. However assuming that the host resolver is the right resolver to use in the case of dockerd resolution failure may not be a correct assumption. Also this results in longer resolution times as dockerd takes a significant amount of time to fail the external DNS request.

mcastelino commented 7 years ago

@devimc We need to add this to our release notes.

devimc commented 7 years ago

great! thanks @mcastelino and nice description

sameo commented 7 years ago

This issue was moved to clearcontainers/runtime#121

jcvenegas commented 7 years ago

Reopen to have a documented limitation of this issue