moby / moby

The Moby Project - a collaborative project for the container ecosystem to assemble container-based systems
https://mobyproject.org/
Apache License 2.0
68.83k stars 18.67k forks source link

Container using network mode host does not get its resolv.conf updated when the host's resolv.conf is updated (using systemd-resolved) #46199

Open sidevesh opened 1 year ago

sidevesh commented 1 year ago

Description

Because the resolv.conf is not updated on the container, it stops having access to the internet when the host / device changes networks. I saw https://github.com/docker/for-linux/issues/889 which mentions that it is supposed to be updated automatically but I actually can't find where this is mentioned in https://docs.docker.com/v17.09/engine/userguide/networking/default_network/configure-dns/.

Is this behavior of the resolv.conf not updating with host a bug or is this something not implemented or intended behavior ?

Reproduce

Start a long running container on your laptop (which is using systemd-resolved), then move to a different network with different DNS servers. Notice that the resolv.conf inside the container is now wrong.

Expected behavior

resolv.conf on container should match the updated host's resolv.conf

docker version

Client:
 Version:           24.0.5
 API version:       1.43
 Go version:        go1.20.6
 Git commit:        ced0996600
 Built:             Wed Jul 26 21:44:58 2023
 OS/Arch:           linux/amd64
 Context:           default

Server:
 Engine:
  Version:          24.0.5
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.6
  Git commit:       a61e2b4c9c
  Built:            Wed Jul 26 21:44:58 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.7.2
  GitCommit:        0cae528dd6cb557f7201036e9f43420650207b58.m
 runc:
  Version:          1.1.8
  GitCommit:        
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

docker info

Client:
 Version:    24.0.5
 Context:    default
 Debug Mode: false

Server:
 Containers: 7
  Running: 1
  Paused: 0
  Stopped: 6
 Images: 14
 Server Version: 24.0.5
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: true
  Native Overlay Diff: false
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 0cae528dd6cb557f7201036e9f43420650207b58.m
 runc version: 
 init version: de40ad0
 Security Options:
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 6.4.8-zen1-1-zen
 Operating System: Arch Linux
 OSType: linux
 Architecture: x86_64
 CPUs: 16
 Total Memory: 13.5GiB
 Name: Lenovo-Yoga-7
 ID: 3PMN:VRXJ:C3R6:RFC2:ZLXJ:OJJU:OFKE:DQLW:YBC6:YYWQ:EHPI:WDWG
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Additional Info

No response

polarathene commented 1 year ago

I think I've experienced this myself when using VMware to run a VM guest running Linux as the Docker host.

When I suspend the VM guest and later restore it, the network works fine in the guest, but not within the containers (they could talk to each other but no longer reach external networks, such as for installing packages). I'd resolve it by running systemctl restart docker.

Would be great if that wasn't required.

thaJeztah commented 1 year ago

From the description, I think this is for the "default" bridge network.

When using the default bridge, the "legacy" networking stack (pre "custom networks") is used;

This flow originated from the very early beginnings of Docker, and was designed with the assumption that the daemon would run in a server environment (no dynamic IP and/or networks), and before systemd-resolvd existed (having a localhost / 127.0.0.x resolver was an "exception", not the "rule");

It may be clear that a lot of complexity is involved here, and quite some parts where things can go wrong (looking up systemd-resolvd's upstreams); dynamically updating the resolv.conf for each container could be an option, but I guess the challenge would be somewhat to decide what should trigger this; alternatively, maybe we can do this on a reload (systemctl reload docker.service to trigger re-generating resolv.conf).

The better solution would probably be to remove the legacy code-path, and always use the embedded DNS; I opened a ticket for that once;

The reason the legacy code-path still exists was (IIRC) for a few reasons, but I think most of those should no longer be a concern (and I'd love to get rid of the two distinct implementations);

thaJeztah commented 1 year ago

LOL, and for some reason, I completely ignored the "network host" part; I think still related though 😂

polarathene commented 1 year ago

I completely ignored the "network host" part

Oh I did too. My experience was with the default bridge, the VM guest network changes when resumed, which requires restarting the daemon AFAIK. I'm not bothered by that much personally.


Slightly off-topic (still DNS + systemd-resolved)

I wasn't familiar with what your response was covering until today, where prior to seeing your response I had strung together some idea of why BuildKit was not behaving like users expected.

I tried following the PRs and related code between the projects to see what was different/missing, and you spotted the concern in Oct 2022 😎

I am done looking further into it, but I assume Buildx did not get around to passing a DNSConfig override to BuildKit (assuming Buildx has more context with network config on what is more appropriate). There's a BuildKit issue with PR (going stale) which was about resolv.conf handling with --network=host / networkMode (buildkitd.toml).

thaJeztah commented 1 year ago

I tried following the PRs and related code between the projects to see what was different/missing, and you spotted the concern in Oct 2022 😎

Heh, I knew which PR you linked to without clicking the link. I opened that PR when I was somewhat cleaning up the resolvconf package, which had become over-engineered and complex over the years (still more to do there!).

Host --network=host handles DNS

So for the --network=host case, the situation is somewhat similar to the "default bridge" case, but for different reasons;

But: here's where the "fun" start, because while the "networking" namespace is the same, the filesystem (mount namespace) is still separate, and we still need to configure the container so that processes inside the container know what resolver to use;

A logical approach would be to bind-mount the host's /etc/resolv.conf into the container, but that had some challenges;

So, for these reasons, we (again) need a COPY of the host's /etc/resolv.conf (or whatever that's symlinked to) for each container, and make sure that

Which brings us back to "square one" (described in my "bridge" comment from earlier) :joy:

Reconfiguring the "embedded DNS"

So this is something I need to look into, and what came up when I discussed this with @akerouanton

While writing my earlier comment, my assumption was that the embedded DNS itself has no real configuration

However, this MAY not be the case (this is something I need to look into / verify), and it's possible that the embedded DNS also is using more than that, and may be reading systemd-resolvd's UPSTREAM DNS resolvers to configure what it should use. This would mean that dynamically switching networks would also prevent the embedded DNS from using the correct DNS. And if that's the case, that's probably something that should be fixed.

polarathene commented 1 year ago
  • we don't want the container to be able to modify the file on the host (which would be the case if we'd bind-mount the file from the host's /etc/resolv.conf).

Does it need to be writeable? You could have a :ro bind mount? (not that it makes much difference as you noted with the inode & symlink concerns)

thaJeztah commented 1 year ago

Does it need to be writeable?

For docker itself, no. Customisations can be made through the --dns, --dns-opt, --add-host, --hostname etc options, and those are made when the container is created (so would not require the file to be writable).

But having these files (/etc/hosts, /etc/resolv.conf, /etc/hostname writable is a feature that was added at some point, so 🤷‍♂️ ; see

Admitted, I think most of the requests were for /etc/hosts to be writable, but there may have been some cases where either the user, or software they were running required (expected) those files to be writable.

sidevesh commented 1 year ago

I am thinking of trying to solve this for my setup by 127.0.0.53 to the resolv.conf within the container, if I do that, as I understand from @thaJeztah 's first comment, the file will stop getting updated, how would I then revert it back to being managed by docker ? ie undo my changes

thaJeztah commented 1 year ago

Don't think there's an easy way for that, although maybe it works if you change it back to the original version (and the checksum matches).

But the alternative would be to just explicitly set the container to use --dns=127.0.0.53 because that's the fixed address for resolvd (any changes in external resolvers should be handled by / abstracted by resolvd itself?)

sidevesh commented 1 year ago

yeah but will that work for network mode host ? and in that case will the resolv.conf in container then not be a copy of the host's but rather contain just 127.0.0.53 ?

thaJeztah commented 1 year ago

In network mode "host", there is no container from a networking perspective; does your resolv.conf on the host contain any other options?

sidevesh commented 1 year ago

the resolv.conf in host for me contains the upstream dns servers, these would not get updated when my laptop would change network, I just put the --dns=127.0.0.53 option on the docker container and I see now that 127.0.0.53 gets set as the only entry in the resolv.conf in container, and that actually fixes my issue!

I guess this issue can remain open though since the dns settings in container not updating with host, especially for network mode host containers is sort of problematic

thaJeztah commented 1 year ago

the resolv.conf in host for me contains the upstream dns servers, these would not get updated when my laptop would change network

Interesting. I wonder why that is; to my understanding, normally it would only contain systemd-resolvd's address, because systemd-resolvd acts as forwarder for the "upstream" DNS servers.

So, in practice /etc/resolv.conf would never have to change.

Here's from a test machine I have running on DigitalOcean, which has systemd-resolved;

ls -la /etc/resolv.conf
lrwxrwxrwx 1 root root 39 Nov  1  2018 /etc/resolv.conf -> ../run/systemd/resolve/stub-resolv.conf

cat /etc/resolv.conf
# This file is managed by man:systemd-resolved(8). Do not edit.
#
# This is a dynamic resolv.conf file for connecting local clients to the
# internal DNS stub resolver of systemd-resolved. This file lists all
# configured search domains.
#
# Run "resolvectl status" to see details about the uplink DNS servers
# currently in use.
#
# Third party programs must not access this file directly, but only through the
# symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a different way,
# replace this symlink by a static file or a different symlink.
#
# See man:systemd-resolved.service(8) for details about the supported modes of
# operation for /etc/resolv.conf.

nameserver 127.0.0.53
options edns0 trust-ad
thaJeztah commented 1 year ago

Wondering if something like network manager is not aware of systemd-resolvd (and that replacing the /etc/resolv.conf symlink with an actual file).

I must admit that I don't run Linux locally, but I have seen reports where multiple systems (network manager, other tools) were stepping on each-other's toes, and all trying to manage the same things.

sidevesh commented 1 year ago

systemd-resolved can actually run in different modes based on resolv.conf contents https://man.archlinux.org/man/systemd-resolved.8#/ETC/RESOLV.CONF the recommended way is for resolv.conf to be a symlink to /run/systemd/resolve/stub-resolv.conf but resolv.conf can be maintained by something else (like NetworManager) and then systemd-resolved can act as a consumer of that file rather than managing it, and that is how I think its setup on my system. I dont remember how or why I configured it that way but its been working perfectly for me for a while and it is not really something wrong, just not the recommended way.

thaJeztah commented 1 year ago

Ah, thanks for that link! Now that you mention it, I think I ran into that part at some point 🤔

Definitely something to take into account, and that may actually part of the reason why this may all work "depending on the situation".

I'm wondering though how such a setup is "meant" to work, because if the recommendation (see my comment) is "systems should always look at /etc/resolv.conf (and don't look any further)" ... how would they ever use systemd-resolved ? Because in such a setup, /etc/resolv.conf doesn't even include 127.0.0.53, so "nothing" would actually be using it? 🤔 🤔 🤔

Guess I need to do more reading how that's expected to work.

The "easy" approach is to (somehow) detect that systemd-resolvd is active, and then just hardcode to 127.0.0.53, but that's making a lot of assumptions 🤔

polarathene commented 1 year ago

I'm wondering though how such a setup is "meant" to work, because if the recommendation (see my comment) is "systems should always look at /etc/resolv.conf (and don't look any further)" ... how would they ever use systemd-resolved ? Because in such a setup, /etc/resolv.conf doesn't even include 127.0.0.53, so "nothing" would actually be using it? 🤔 🤔 🤔

They can use systemd-resolved via a D-BUS API according to the docs. Some software will use glibc via NSS (/etc/nsswitch.conf) which can also use the nss-resolve module to query systemd-resolved.

I think nslookup goes that route, whereas a tool like dig reads /etc/resolv.conf directly. If /etc/resolv.conf doesn't point to anything related to systemd-resolved, I still found nslookup was using the DNS nameserver in /etc/resolv.conf (which nss-resolve I think intentionally does in that scenario).

It may vary with other networking needs such as split-dns (aka conditional-forwarding), mDNS, perhaps some VPN software, but I'd focus on /etc/resolv.conf and ensuring that's supported well, then let any other scenario reveal itself via user reports 😅

Guess I need to do more reading how that's expected to work.

I had a shot at that if it helps. Bit verbose, so I collapsed the bulk of my response below.

Only surprise I really noticed was with the default bridge behaviour, freshly started containers were not recognizing that /etc/resolv.conf had changed (no longer a symlink, nor 127.0.0.53 used). To get back non-systemd-resolved behaviour I had to restart the Docker daemon. Seems like a bug? (v24.0.5).


Notes

In my VM guest, NetworkManager is used, systemd-resolved is not:

# Default config from VM install:
$ cat /etc/resolv.conf
# Generated by NetworkManager
search localdomain
nameserver 192.168.12.2

$ resolvectl status
Failed to get global data: Unit dbus-org.freedesktop.resolve1.service not found.

$ systemctl is-active systemd-resolved
inactive

Enabling (initial setup)

Click to expand ```console # Enable `systemd-resolved`: $ systemctl start systemd-resolved.service $ systemctl is-active systemd-resolved active $ resolvectl status Global Protocols: +LLMNR +mDNS -DNSOverTLS DNSSEC=no/unsupported resolv.conf mode: foreign Current DNS Server: 192.168.12.2 DNS Servers: 192.168.12.2 Fallback DNS Servers: 1.1.1.1#cloudflare-dns.com 9.9.9.9#dns.quad9.net 8.8.8.8#dns.google 2606:4700:4700::1111#cloudflare-dns.com 2620:fe::9#dns.quad9.net 2001:4860:4860::8888#dns.google DNS Domain: localdomain Link 2 (ens33) Current Scopes: DNS LLMNR/IPv4 LLMNR/IPv6 mDNS/IPv4 mDNS/IPv6 Protocols: +DefaultRoute +LLMNR +mDNS -DNSOverTLS DNSSEC=no/unsupported DNS Servers: 192.168.12.2 DNS Domain: localdomain Link 3 (docker0) Current Scopes: none Protocols: -DefaultRoute +LLMNR +mDNS -DNSOverTLS DNSSEC=no/unsupported Link 99 (veth733e157) Current Scopes: LLMNR/IPv6 mDNS/IPv6 Protocols: -DefaultRoute +LLMNR +mDNS -DNSOverTLS DNSSEC=no/unsupported $ resolvectl dns Global: 192.168.12.2 Link 2 (ens33): 192.168.12.2 Link 3 (docker0): Link 99 (veth733e157): $ resolvectl default-route Global Protocols: +LLMNR +mDNS -DNSOverTLS DNSSEC=no/unsupported resolv.conf mode: foreign Current DNS Server: 192.168.12.2 DNS Servers: 192.168.12.2 Fallback DNS Servers: 1.1.1.1#cloudflare-dns.com 9.9.9.9#dns.quad9.net 8.8.8.8#dns.google 2606:4700:4700::1111#cloudflare-dns.com 2620:fe::9#dns.quad9.net 2001:4860:4860::8888#dns.google DNS Domain: localdomain Link 2 (ens33): yes Link 3 (docker0): no Link 99 (veth733e157): no # No change here yet of course (even if restarting NetworkManager service): $ cat /etc/resolv.conf # Generated by NetworkManager search localdomain nameserver 192.168.12.2 ```

systemd-resolved - Foreign mode (no symlink)

Click to expand The above only enabled `systemd-resolved` to run in **`foreign` mode** (_no symlink used_), which is [described in docs](https://man.archlinux.org/man/systemd-resolved.8#/ETC/RESOLV.CONF): > _Alternatively, `/etc/resolv.conf` may be managed by other packages, in which case `systemd-resolved` will read it for DNS configuration data. In this mode of operation `systemd-resolved` is consumer rather than provider of this configuration file._ > > _Note that the selected mode of operation for this file is detected fully automatically, depending on whether `/etc/resolv.conf` is a symlink to `/run/systemd/resolve/resolv.conf` or lists `127.0.0.53` as DNS server._ So naturally no difference for most software (_NSS / glibc will use `nss-resolve` via `/etc/nsswitch.conf`, and that'll look at `/etc/resolv.conf` AFAIK_) - I think with Alpine / musl that's a bit different as I don't recall that going through NSS. Querying with DNS lookups we get the expected `192.168.12.2` nameserver used: ```console # Reads `/etc/resolv.conf` directly: $ dig docker.com | grep SERVER ;; SERVER: 192.168.12.2#53(192.168.12.2) (UDP) # Copy `/etc/resolv.conf` into container: $ docker run --rm -it ghcr.io/mr-karan/doggo docker.com NAME TYPE CLASS TTL ADDRESS NAMESERVER docker.com. A IN 5s 141.193.213.20 192.168.12.2:53 docker.com. A IN 5s 141.193.213.21 192.168.12.2:53 ``` You can change which DNS servers to query in this case for `systemd-resolved` (_which will recognize these, but most software AFAIK is unaffected unless using D-BUS API `systemd-resolved` has?_). `resolvectl query docker.com` would respect the configuration update (_although it's not necessarily obvious in this case, but pointing to a local DNS resolver with a different response would help verify_). But since software is using `/etc/resolv.conf` like `dig` (_or Docker, since it's not a symlink, nor any `127.0.0.53` entry present_), this won't make much difference and queries would still go through `192.168.12.2` (_VMware is managing the VM guest DNS_). It may work a bit differently with more specialized networking setups 🤷‍♂️ ```bash # Changing DNS servers to use: # Global DNS via override in `/etc/systemd/resolved.conf.d/`: sudo mkdir /etc/systemd/resolved.conf.d/ echo -e '[Resolve]\nDNS = 1.1.1.1 1.0.0.1' | sudo tee /etc/systemd/resolved.conf.d/global-dns.conf # Restart required: sudo systemctl restart systemd-resolved # Temporary DNS override per interface (eg: ens33): sudo resolvectl dns ens33 1.1.1.1 ```

systemd-resolved - Stub mode (symlink)

This is the most common way to use systemd-resolved out of the 4 supported modes?

# Switch to symlink with stub:
$ sudo ln -sf /run/systemd/resolve/stub-resolv.conf /etc/resolv.conf
$ sudo systemctl restart systemd-resolved
$ sudo systemctl restart NetworkManager

# Testing:
$ dig docker.com | grep SERVER
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)

# Wrong?:
$ docker run --rm -it ghcr.io/mr-karan/doggo docker.com
NAME            TYPE    CLASS   TTL     ADDRESS         NAMESERVER 
docker.com.     A       IN      175s    141.193.213.21  8.8.8.8:53
docker.com.     A       IN      175s    141.193.213.20  8.8.8.8:53
docker.com.     A       IN      223s    141.193.213.21  8.8.4.4:53
docker.com.     A       IN      223s    141.193.213.20  8.8.4.4:53

# Correct:
$ docker run --rm -it --network host ghcr.io/mr-karan/doggo docker.com
NAME            TYPE    CLASS   TTL     ADDRESS         NAMESERVER    
docker.com.     A       IN      5s      141.193.213.20  127.0.0.53:53
docker.com.     A       IN      5s      141.193.213.21  127.0.0.53:53

# Correct:
$ docker network create custom
$ docker run --rm -it --network custom ghcr.io/mr-karan/doggo docker.com
NAME            TYPE    CLASS   TTL     ADDRESS         NAMESERVER    
docker.com.     A       IN      172s    141.193.213.21  127.0.0.11:53
docker.com.     A       IN      172s    141.193.213.20  127.0.0.11:53

Default bridge /etc/resolv.conf bug

Notably the default bridge docker0 seemed odd. Those are the default fallback DNS servers to use by the docker daemon, but these were not the content from the symlinked /etc/resolv.conf (127.0.0.1), or the /run/systemd/resolve/resolv.conf (_which would have been expected? 1 => 2_):

# NOTE: Large comment blocks removed from outputs to reduce noise

$ cat /etc/resolv.conf
# This is /run/systemd/resolve/stub-resolv.conf managed by man:systemd-resolved(8).
nameserver 127.0.0.53
options edns0 trust-ad
search localdomain

$ cat /run/systemd/resolve/stub-resolv.conf
# This is /run/systemd/resolve/stub-resolv.conf managed by man:systemd-resolved(8).
nameserver 127.0.0.53
options edns0 trust-ad
search localdomain

# Default upstream DNS + `global-dns.conf` override:
$ cat /run/systemd/resolve/resolv.conf
# This is /run/systemd/resolve/resolv.conf managed by man:systemd-resolved(8).
nameserver 1.1.1.1
nameserver 1.0.0.1
nameserver 192.168.12.2
search localdomain

# Docker content copied the host `/etc/resolv.conf`, but modified the `nameserver` entries:
$ docker run --rm -it alpine cat /etc/resolv.conf
# This is /run/systemd/resolve/stub-resolv.conf managed by man:systemd-resolved(8).
options edns0 trust-ad
search localdomain
nameserver 8.8.8.8
nameserver 8.8.4.4

Workaround

This wasn't difficult to resolve, but I didn't realize that the daemon needed to be restarted for it to ensure containers source the /run/systemd/resolve/resolv.conf content..

Click to expand ``` # The fix: sudo systemctl restart docker # Correct: docker run --rm -it ghcr.io/mr-karan/doggo docker.com NAME TYPE CLASS TTL ADDRESS NAMESERVER docker.com. A IN 250s 141.193.213.20 1.1.1.1:53 docker.com. A IN 250s 141.193.213.21 1.1.1.1:53 docker.com. A IN 251s 141.193.213.20 1.0.0.1:53 docker.com. A IN 251s 141.193.213.21 1.0.0.1:53 docker.com. A IN 5s 141.193.213.20 192.168.12.2:53 docker.com. A IN 5s 141.193.213.21 192.168.12.2:53 # Correct (Comment now the expected source, nameservers now correct): $ docker run --rm -it alpine cat /etc/resolv.conf # This is /run/systemd/resolve/resolv.conf managed by man:systemd-resolved(8). nameserver 1.1.1.1 nameserver 1.0.0.1 nameserver 192.168.12.2 search localdomain ``` Reproduction: ```console # Back to static /etc/resolv.conf: $ sudo rm /etc/resolv.conf # NetworkManager will now recreate `/etc/resolv.conf`: $ sudo systemctl restart NetworkManager $ cat /etc/resolv.conf # Generated by NetworkManager search localdomain nameserver 192.168.12.2 # Correct: $ docker run --rm -it --network host ghcr.io/mr-karan/doggo docker.com NAME TYPE CLASS TTL ADDRESS NAMESERVER docker.com. A IN 5s 141.193.213.21 192.168.12.2:53 # Correct: $ docker run --rm -it --network custom ghcr.io/mr-karan/doggo docker.com NAME TYPE CLASS TTL ADDRESS NAMESERVER docker.com. A IN 5s 141.193.213.21 127.0.0.11:53 # Outdated: $ docker run --rm -it ghcr.io/mr-karan/doggo docker.com NAME TYPE CLASS TTL ADDRESS NAMESERVER docker.com. A IN 73s 141.193.213.21 1.1.1.1:53 docker.com. A IN 73s 141.193.213.20 1.1.1.1:53 docker.com. A IN 72s 141.193.213.21 1.0.0.1:53 docker.com. A IN 72s 141.193.213.20 1.0.0.1:53 docker.com. A IN 5s 141.193.213.21 192.168.12.2:53 docker.com. A IN 5s 141.193.213.20 192.168.12.2:53 # The Cloudflare entries from `global-dns.conf` override should not be there, nor the comment. # Container is sourcing the systemd-resolved config, but /etc/resolv.conf no longer has 127.0.0.53: $ docker run --rm -it alpine cat /etc/resolv.conf # This is /run/systemd/resolve/resolv.conf managed by man:systemd-resolved(8). nameserver 1.1.1.1 nameserver 1.0.0.1 nameserver 192.168.12.2 search localdomain ``` Now with updates: ```console # Back to systemd-resolved with stub symlink: $ sudo ln -sf /run/systemd/resolve/stub-resolv.conf /etc/resolv.conf # Removing custom config: $ sudo rm /etc/systemd/resolved.conf.d/global-dns.conf $ sudo systemctl restart systemd-resolved # Restart NM so it doesn't think it manages /etc/resolv.conf anymore: # No need to restart Docker daemon, it's still going to think we're using systemd-resolved anyway: $ sudo systemctl restart NetworkManager $ resolvectl dns Global: Link 2 (ens33): 192.168.12.2 Link 3 (docker0): Link 134 (br-3f6e4658ec31): $ resolvectl default-route Global Protocols: +LLMNR +mDNS -DNSOverTLS DNSSEC=no/unsupported resolv.conf mode: stub Fallback DNS Servers: 1.1.1.1#cloudflare-dns.com 9.9.9.9#dns.quad9.net 8.8.8.8#dns.google 2606:4700:4700::1111#cloudflare-dns.com 2620:fe::9#dns.quad9.net 2001:4860:4860::8888#dns.google Link 2 (ens33): yes Link 3 (docker0): no Link 134 (br-3f6e4658ec31): no $ resolvectl status Global Protocols: +LLMNR +mDNS -DNSOverTLS DNSSEC=no/unsupported resolv.conf mode: stub Fallback DNS Servers: 1.1.1.1#cloudflare-dns.com 9.9.9.9#dns.quad9.net 8.8.8.8#dns.google 2606:4700:4700::1111#cloudflare-dns.com 2620:fe::9#dns.quad9.net 2001:4860:4860::8888#dns.google Link 2 (ens33) Current Scopes: DNS LLMNR/IPv4 LLMNR/IPv6 mDNS/IPv4 mDNS/IPv6 Protocols: +DefaultRoute +LLMNR +mDNS -DNSOverTLS DNSSEC=no/unsupported Current DNS Server: 192.168.12.2 DNS Servers: 192.168.12.2 DNS Domain: localdomain Link 3 (docker0) Current Scopes: none Protocols: -DefaultRoute +LLMNR +mDNS -DNSOverTLS DNSSEC=no/unsupported Link 134 (br-3f6e4658ec31) Current Scopes: none Protocols: -DefaultRoute +LLMNR +mDNS -DNSOverTLS DNSSEC=no/unsupported # Correct (_at least Docker is copying a freshly updated symlink file_): $ docker run --rm -it ghcr.io/mr-karan/doggo docker.com NAME TYPE CLASS TTL ADDRESS NAMESERVER docker.com. A IN 5s 141.193.213.20 192.168.12.2:53 docker.com. A IN 5s 141.193.213.21 192.168.12.2:53 # Temporarily modify the default-route (ens33) to use a different DNS service, # Docker default bridge should use this one instead: $ resolvectl dns ens33 1.1.1.1 # Config is updated as we requested, that should reflect in a Docker container: $ cat /run/systemd/resolve/resolv.conf nameserver 1.1.1.1 search localdomain # Correct: $ docker run --rm -it ghcr.io/mr-karan/doggo docker.com NAME TYPE CLASS TTL ADDRESS NAMESERVER docker.com. A IN 33s 141.193.213.21 1.1.1.1:53 docker.com. A IN 33s 141.193.213.20 1.1.1.1:53 ``` Obviously, this won't cover containers already running. and that isn't feasible for reasons already given in prior comments.

Helpful resources

Click to expand ArchWiki is usually a pretty good resource on system configuration. This is just a quick link + image dump if helpful for reference. ### DNS (`systemd-resolved`) https://wiki.archlinux.org/title/Systemd-resolved#DNS
Click to show images ![image](https://github.com/moby/moby/assets/5098581/6da04c36-859e-4d79-95b0-c171f172812d) ![image](https://github.com/moby/moby/assets/5098581/e31caa86-37d6-4e85-942b-da7d43abec55)
### NetworkManager This is common to have involved in the mix (_on a linux desktop at least_): https://wiki.archlinux.org/title/NetworkManager#DNS_caching_and_conditional_forwarding
Click to show images ![image](https://github.com/moby/moby/assets/5098581/dac856cc-fdca-4d21-9ed8-07495937ddfa) ![image](https://github.com/moby/moby/assets/5098581/fc63d3ac-ae76-4ffc-9cda-3a5466a80800) ![image](https://github.com/moby/moby/assets/5098581/a579000b-40c0-4dde-94d4-ffdc2440ff73)
### NSS and glibc (_shouldn't be a concern I think?_) https://wiki.archlinux.org/title/Domain_name_resolution
Click to show images ![image](https://github.com/moby/moby/assets/5098581/44a24a6b-8fae-4e57-8cfb-bcb724b1c15b) ![image](https://github.com/moby/moby/assets/5098581/fe619064-b50f-4c83-a3d1-ff7b0633c3ed)
polarathene commented 1 year ago

resolv.conf can be maintained by something else (like NetworManager) and then systemd-resolved can act as a consumer of that file rather than managing it, and that is how I think its setup on my system.

I dont remember how or why I configured it that way but its been working perfectly for me for a while and it is not really something wrong, just not the recommended way.

My system was using NetworkManager, but systemd-resolved was not active. I've since enabled it, and this doesn't seem to influence /etc/resolv.conf in anyway which Docker cares about. I get the impression you'd have this problem without systemd-resolved involved too.

When the DNS changes and NetworkManager updates that in /etc/resolv.conf, Docker with --network=host still keeps the old version that the container was started with. I don't think Docker presently supports carrying that update over (I misunderstood and thought that was supported if the container copy was known not to be modified, that it'd be detected and synced).

The fix for you is to symlink /etc/resolv.conf to the systemd-resolved stub resolver config:

# Configure /etc/resolv.conf to symlink to the systemd-resolved stub config:
$ sudo ln -sf /run/systemd/resolve/stub-resolv.conf /etc/resolv.conf
# Might be required:
$ sudo systemctl restart systemd-resolved

# Restart NM so it doesn't think it manages /etc/resolv.conf anymore:
$ sudo systemctl restart NetworkManager
# Restart the Docker daemon, it presumably optimizes by caching the /etc/resolv.conf check
$ sudo systemctl restart docker

Perhaps this could be covered better in the Docker docs Networking DNS section?

diogenxs commented 1 year ago

I get the impression you'd have this problem without systemd-resolved involved too.

I'm getting the same behavior testing on a host with nothing of the fancy stuff like systemd-resolved or NetworkManager enabled, just updating /etc/resolv.conf by hand...

By my tests, if I spun up a container on the bridge or host network, then change the host /etc/resolv.conf to something else, it will just update on the container if I restart the container, restarting the daemon doesn't have any effect, we can easily verify this with the following commands

test with `host` network - `bridge` has the same effect $ echo 'nameserver 8.8.8.8' | sudo tee /etc/resolv.conf nameserver 8.8.8.8 $ docker run -d --name before-change-host-ns --network host wbitt/network-multitool a67ca8cfc2482e49c7040a0b9621c5cd27ed253336892509db83ace9ef63562e $ docker exec before-change-host-ns cat /etc/resolv.conf nameserver 8.8.8.8 $ echo 'nameserver 8.8.4.4' | sudo tee /etc/resolv.conf nameserver 8.8.4.4 # DNS changed at host level $ docker exec before-change-host-ns cat /etc/resolv.conf nameserver 8.8.8.8 # DNS still the same on container $ sudo systemctl restart docker $ docker exec before-change-host-ns cat /etc/resolv.conf nameserver 8.8.8.8 # don't have any effect if restarting only $ docker restart before-change-host-ns before-change-host-ns $ docker exec before-change-host-ns cat /etc/resolv.conf nameserver 8.8.4.4 $ docker stop before-change-host-ns before-change-host-ns $ docker rm before-change-host-ns before-change-host-ns

as Docker docs Networking DNS section for the bridge network states:

By default, containers inherit the DNS settings of the host, as defined in the /etc/resolv.conf configuration file. Containers that attach to the default bridge network receive a copy of this file.

and a previous comment regarding host network

So, for these reasons, we (again) need a COPY of the host's /etc/resolv.conf (or whatever that's symlinked to) for each container, and make sure that

so, we should assume that this is the expected behavior and changes made to /etc/resolv.conf on the host, a container restart is needed for those using host or bridge networks? as both need to COPY the file again :thinking:

colinmollenhour commented 8 months ago

I've found that reliable DNS resolution is pretty hard to get in practice over the long term. Case in point recently, I had Quad9 set up as one of my resolvers and it was exhibiting many 5 second resolution times which was causing many failures that only really were obvious inside a Docker container. I think the best solution to unstable DNS is to use stale DNS caching such as provided by systemd-resolved's StaleRetentionSec option.

When Ubuntu 24.04 LTS releases soon it will include this feature but based on reading this thread Docker won't actually be using systemd-resolved so that probably won't help.

So this is just a datapoint... It would be great if Docker could make use of systemd-resolved or else implement a DNS resolver with stale record retention and other advanced features. Also, the Networking guide should be clear on this - it was a surprise to me that Docker was not using systemd-resolved as I expected an out of the box config to use the system's default resolver through whatever "magic" Docker uses. I see now this is not the case but it is a nuance that is easily overlooked especially with info like this (emphasis mine):

The embedded DNS server forwards external DNS lookups to the DNS servers configured on the host.

To me that read as my local resolver which was systemd-resolved but now I see why that is not the case since really it is just copying resolve.conf from the host and bypassing systemd-resolved server entirely.

akerouanton commented 8 months ago

@colinmollenhour 5s is quite a precise delay. Are every delay about 5s long (or a multiple of 5)? I'm wondering whether you're experiencing another issue.

Could you paste the output of the following command? As it's unrelated to the original issue, I'll tell you whether we should move this discussion to a new ticket.

docker run --rm --net host nicolaka/netshoot /bin/bash -c "conntrack -S 2>/dev/null"
colinmollenhour commented 8 months ago

Are every delay about 5s long (or a multiple of 5)?

Yes, they are, some are 10 seconds. I thought it was perhaps an issue with Quad9 as I cannot reproduce the same issue with Google and Cloudflare DNS. I am using this script to test and will produce the error with 149.112.112.10 (Quad9 secondary) within a minute or two.

❯ ./dns_resolver_test.sh 149.112.112.10 ehub-prod.s3.amazonaws.com xxx_production_1
Starting test resolving ehub-prod.s3.amazonaws.com with 149.112.112.10 at Tue 02 Apr 2024 09:19:40 AM EDT
2024-04-02 09:20:25 - Command took 5.050472004 seconds
^C
Slow runs: 1, errors: 0, total runs: 66 in 74.223639644 seconds

Could you paste the output of the following command?

I add --privileged --cap-add all because there was an error about access to /proc, but here is the output:

cpu=0           found=199 invalid=6283 ignore=471766 insert=0 insert_failed=0 drop=0 early_drop=0 error=1 search_restart=1345222
cpu=1           found=215 invalid=5843 ignore=475113 insert=0 insert_failed=0 drop=0 early_drop=0 error=2 search_restart=1336034
cpu=2           found=193 invalid=5533 ignore=460339 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=1354211
cpu=3           found=163 invalid=8573 ignore=304773 insert=0 insert_failed=0 drop=0 early_drop=0 error=1 search_restart=1262725
cpu=4           found=188 invalid=7160 ignore=392107 insert=0 insert_failed=0 drop=0 early_drop=0 error=1 search_restart=1361169
cpu=5           found=199 invalid=6210 ignore=394692 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=1326756
cpu=6           found=234 invalid=20245 ignore=297731 insert=0 insert_failed=0 drop=0 early_drop=0 error=1 search_restart=1758928
cpu=7           found=317 invalid=890722 ignore=1022973 insert=0 insert_failed=2662 drop=2592 early_drop=0 error=9270 search_restart=8901692
cpu=8           found=227 invalid=38666 ignore=241491 insert=0 insert_failed=0 drop=0 early_drop=0 error=4 search_restart=2462636
cpu=9           found=161 invalid=872341 ignore=919601 insert=0 insert_failed=2818 drop=2731 early_drop=0 error=10146 search_restart=8731573
cpu=10          found=143 invalid=45082 ignore=221160 insert=0 insert_failed=0 drop=0 early_drop=0 error=4 search_restart=2487178
cpu=11          found=152 invalid=899143 ignore=907548 insert=0 insert_failed=3409 drop=3295 early_drop=0 error=9345 search_restart=8939111
cpu=12          found=163 invalid=34762 ignore=359433 insert=0 insert_failed=0 drop=0 early_drop=0 error=1 search_restart=2186805
cpu=13          found=183 invalid=36447 ignore=469003 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=2339250
cpu=14          found=180 invalid=31410 ignore=444650 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=2152859
cpu=15          found=158 invalid=8235 ignore=299441 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=1245792
cpu=16          found=196 invalid=7225 ignore=387385 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=1375222
cpu=17          found=200 invalid=6873 ignore=380749 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=1349991
cpu=18          found=144 invalid=7860 ignore=305123 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=1222336
cpu=19          found=207 invalid=6745 ignore=399897 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=1356248
cpu=20          found=195 invalid=6442 ignore=391011 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=1327708
cpu=21          found=220 invalid=7810 ignore=309319 insert=0 insert_failed=0 drop=0 early_drop=0 error=1 search_restart=1216044
cpu=22          found=216 invalid=6535 ignore=399783 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=1338000
cpu=23          found=237 invalid=5809 ignore=397914 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=1315170
cpu=24          found=176 invalid=5264 ignore=495582 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=1317282
cpu=25          found=232 invalid=5309 ignore=513012 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=1386330
cpu=26          found=198 invalid=5080 ignore=518217 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=1397518
cpu=27          found=209 invalid=5657 ignore=428539 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=1251228
cpu=28          found=225 invalid=5596 ignore=460995 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=1358578
cpu=29          found=218 invalid=5521 ignore=472153 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=1362277
cpu=30          found=268 invalid=35122 ignore=302764 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=2345079
cpu=31          found=299 invalid=30255 ignore=353968 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=2246755
cpu=32          found=315 invalid=24720 ignore=386824 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=2043688
cpu=33          found=148 invalid=51728 ignore=206338 insert=0 insert_failed=0 drop=0 early_drop=0 error=1 search_restart=2666327
cpu=34          found=175 invalid=48175 ignore=244172 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=2594460
cpu=35          found=168 invalid=41229 ignore=293535 insert=0 insert_failed=0 drop=0 early_drop=0 error=1 search_restart=2422325
cpu=36          found=183 invalid=25376 ignore=465666 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=1942030
cpu=37          found=198 invalid=24104 ignore=480904 insert=0 insert_failed=0 drop=0 early_drop=0 error=1 search_restart=1979115
cpu=38          found=213 invalid=882017 ignore=2070738 insert=0 insert_failed=2954 drop=2599 early_drop=0 error=9083 search_restart=9035387
cpu=39          found=175 invalid=6499 ignore=407174 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=1254861
cpu=40          found=235 invalid=6403 ignore=443415 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=1391225
cpu=41          found=214 invalid=6097 ignore=450127 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=1383813
cpu=42          found=212 invalid=5957 ignore=418941 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=1248082
cpu=43          found=202 invalid=5955 ignore=448079 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=1363141
cpu=44          found=201 invalid=5916 ignore=457749 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=1359532
cpu=45          found=195 invalid=5544 ignore=420452 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=1233570
cpu=46          found=242 invalid=5627 ignore=449637 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=1347149
cpu=47          found=206 invalid=5536 ignore=461821 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=1347365

I have no idea what all of that means.. :)

akerouanton commented 8 months ago

@colinmollenhour Those insert_failed could mean there're clashes over conntrack entry creation (as described in https://blog.quentin-machu.fr/2018/06/24/5-15s-dns-lookups-on-kubernetes/).

For further discussion / investigation, could you please open a separate ticket mentioning the 5s delay you're seeing in some DNS queries? We might have a few other tests to run to confirm my hunch.

maxdd commented 7 months ago

I'm also facing something similar. My containers are fine and so is my host but when I build a docker image the building containers do not have access, so if I RUN apt or wget they just time out