moby / libnetwork

networking for containers
Apache License 2.0
2.16k stars 881 forks source link

DNS: client does not receive an NXDOMAIN when 1 of 3 servers times out #2613

Open jbergler opened 3 years ago

jbergler commented 3 years ago

Hi there,

We discovered an issue today in how queries passed to an external DNS server are retried when an NXDOMAIN is received.

# /etc/resolv.conf (host)
search example.com
nameserver 10.0.0.1
nameserver 10.0.0.2
nameserver 10.0.0.3
nameserver 10.0.0.4
# /etc/resolv.conf (container)
search example.com
nameserver 127.0.0.11
options single-request timeout:1 ndots:0

Under normal circumstances, DNS works as expected. The NXDOMAIN is received by the client, resulting in the search domain being appended and the query retried.

$ time host usw125
Host usw125 not found: 3(NXDOMAIN)
real    0m0.052s
user    0m0.000s
sys     0m0.010s

However, when 10.0.0.3 goes offline, the container never receives the NXDOMAIN and thus never tries to resolve the query with the search domain.

$ time host foo
;; connection timed out; no servers could be reached
real    0m10.011s
user    0m0.004s
sys     0m0.007s

We can see that the docker daemon tried 3 servers (expected, since the NXDOMAIN is not authoritative). It receives 2 NXDOMAIN responses followed by a timeout. At this point we've hit our limit (since maxExtDNS = 3) and we fall through without sending a response

# /var/logl/dockerd.log
Name To resolve: foo.
[resolver] query foo. (A) from 172.18.0.5:55172, forwarding to udp:10.0.0.1
[resolver] external DNS udp:10.0.0.1 responded with NXDOMAIN for "foo."
[resolver] query foo. (A) from 172.18.0.5:39805, forwarding to udp:10.0.0.2
[resolver] external DNS udp:10.0.0.2 responded with NXDOMAIN for "foo."
[resolver] query foo. (A) from 172.18.0.5:54945, forwarding to udp:10.0.0.3
[resolver] read from DNS server failed, read udp 172.18.0.5:54945->10.0.0.3:53: i/o timeout

It seems to me that somehow this failure mode should(?)/could return one of the NXDOMAIN responses we previously received allowing the client to continue operating rather than hanging for extended periods of time as if all DNS servers were unreachable.

jbergler commented 3 years ago

@thaJeztah if, on the off chance, you had some time to take a look at this it would be much appreciated. From my debugging I believe this may be introduced in a86d2765b829fb122c70eea7a914d59a8fb1df4a