inconsistent behavior for /etc/resolv.conf with stub resolvers

tgross commented 3 years ago

While chatting with @mattrobenolt about exposing Consul DNS to a Nomad task, we ran into something unexpected where the behavior I described in https://github.com/hashicorp/nomad/issues/8343#issuecomment-863325421 doesn't hold in the case where the client is not using systemd-resolved. It turns out that dockerd (or more properly libnetwork) special-cases how it creates a /etc/resolv.conf file for containers when it thinks systemd-resolved is in use. See moby/libnetwork/resolvconf/resolvconf.go#L18-L21

with systemd-resolved, docker tasks get the host's (non-stub) resolv.conf.
with systemd-resolved, exec tasks get the stub resolv.conf pointing to 127.0.0.53.
without systemd-resolved, both docker and exec tasks get the host's resolv.conf.
with a network.dns block, both docker and exec tasks get the Nomad-managed resolv.conf generated by GenerateDNSMount

While the inconsistent behavior is the "fault" of the task driver engine, it makes configuring Consul DNS for tasks in a sensible way challenging and awfully host-specific.

Some proposals:

Can Nomad always manage the resolv.conf? This would break backwards compatibility but would make it possible to point tasks to a Nomad-controlled IP; this could be the host's IP or a Consul IP for a stub resolver, etc.
Can exec task drivers do the same thing that docker does with the stub resolver? (This is kind of gross and is annoyingly undocumented in Docker, too, as far as we can tell).
In lieu of that, a couple of "blessed" configurations for Consul DNS would be super nice to have documented.

Reproduction for the stub resolver behavior.

Using the Vagrant machine found at the root of this repo and running the following jobspec that has both a docker and exec task sharing a network namespace:

Docker jobspec

```hcl job "example" { datacenters = ["dc1"] group "web" { network { mode = "bridge" port "web1" { to = 8001 } port "web2" { to = 8002 } } task "web1" { driver = "docker" config { image = "busybox:1" command = "httpd" args = ["-f", "-h", "/tmp", "-p", "8001"] ports = ["web1"] } resources { cpu = 256 memory = 128 } } task "web2" { driver = "exec" config { command = "busybox" args = ["httpd", "-f", "-h", "/tmp", "-p", "8002"] } resources { cpu = 256 memory = 128 } } } } ```

The docker driver gets the "real" resolver used by systemd-resolved and found at /run/systemd/resolve/resolv.conf:

$ nomad alloc exec -task web1 d2f cat /etc/resolv.conf
...

nameserver 10.0.2.3
search fios-router.home

But the exec driver gets the stub resolver from the host's /etc/resolv.conf

$ nomad alloc exec -task web2 d2f cat /etc/resolv.conf
...

nameserver 127.0.0.53
options edns0
search fios-router.home

Now replace systemd-resolved's stub resolver with unbound:

$ sudo apt-get install -y unbound
$ echo 'DNSStubListener=no' | sudo tee /etc/systemd/resolved.conf

Restart the VM and make sure the stub listener isn't what's linked to /etc/resolv.conf anymore:

sudo systemctl stop systemd-resolved
sudo rm /etc/resolv.conf
echo 'nameserver 8.8.8.8' | sudo tee /etc/resolv.conf
sudo systemctl start systemd-resolved

Now both of them get the /etc/resolv.conf file from the host:

$ nomad alloc exec -task web1 208 cat /etc/resolv.conf
nameserver 8.8.8.8

$ nomad alloc exec -task web2 208 cat /etc/resolv.conf
nameserver 8.8.8.8

restore your Vagrant VM back to the previous state

``` sudo rm /etc/resolv.conf sudo nano /etc/systemd/resolved.conf # remove the DNSStubListener=no line sudo apt-get remove -y unbound sudo ln -fs /run/systemd/resolve/stub-resolv.conf /etc/resolv.conf ``` Then reboot the VM.

mattrobenolt commented 3 years ago

For a little bit more context on my setup specifically, I disabled systemd-resolved entirely, and install CoreDNS to work as a resolver on the host. I then put nameserver 127.0.0.1 in the host /etc/resolved.conf. To my surprise, docker's resolvedconf stuff also looks for this and removes any 127.0.0.* nameserver lines, which makes sense since they wouldn't work at all, and places defaults of 8.8.8.8 and 8.8.4.4.

So, this was fine. In pure docker world, I set the default DNS nameserver to the docker bridge IP, so it can reach out from the container over the bridge IP to talk to the host resolver.

This depends on me binding now the CoreDNS listener to both 127.0.0.1 and the docker0 bridge IP so it can respond. Where this falls apart is using nomad's bridge mode now is a different IP. This is sorta fine, but nomad seems to lazily create the bridge network when it's first needed. So I can't reliably bind CoreDNS to the nomad bridge network since it doesn't exist yet.

All in all, I ended up working around this by not using 127.0.0.1 for my host /etc/resolve.conf, and instead us the server's private IP, in the 10.0.0.0/8 range. Doing this allows it to be brought into the container untouched and doesn't need to use the bridge IPs at all.

I don't know exactly what nomad can really do here, but having some sensible way to say "I want to resolve DNS with consul on the host" would be good. What really tripped me up is the very undocumented and unexpected behaviors of libnetwork and what docker was doing here.

And just for context, here's the trivial CoreDNS config I'm using:

.:53 {
  bind 127.0.0.1 10.0.0.1
  forward . 147.75.207.207 147.75.207.208
  cache
}

consul.:53 {
  bind 127.0.0.1 10.0.0.1
  forward . 127.0.0.1:8600
}

apollo13 commented 3 years ago

I can offer yet another option (docker daemon.json):

{
    "dns": [
        "172.22.3.201"
    ],
    "dns-search": [
        "consul"
    ]
}

where the ip is the private IP of the server. My coredns configuration looks like this:

. {
  forward . /etc/resolv.conf
}

consul {
  forward . dns://127.0.0.1:8600
}

Imo that is kinda the best of both worlds. Docker gets a fixed DNS server and CoreDNS uses whatever was configured on the host.

lgfa29 commented 3 years ago

Thanks for the detailed report @tgross!

Also thanks @mattrobenolt and @apollo13 for the additional context. We will investigate this further.

hashicorp / nomad

inconsistent behavior for /etc/resolv.conf with stub resolvers #11033

Reproduction for the stub resolver behavior.