Under normal circumstances, DNS works as expected. The NXDOMAIN is received by the client, resulting in the search domain being appended and the query retried.
$ time host usw125
Host usw125 not found: 3(NXDOMAIN)
real 0m0.052s
user 0m0.000s
sys 0m0.010s
However, when 10.0.0.3 goes offline, the container never receives the NXDOMAIN and thus never tries to resolve the query with the search domain.
$ time host foo
;; connection timed out; no servers could be reached
real 0m10.011s
user 0m0.004s
sys 0m0.007s
We can see that the docker daemon tried 3 servers (expected, since the NXDOMAIN is not authoritative).
It receives 2 NXDOMAIN responses followed by a timeout.
At this point we've hit our limit (since maxExtDNS = 3) and we fall through without sending a response
# /var/logl/dockerd.log
Name To resolve: foo.
[resolver] query foo. (A) from 172.18.0.5:55172, forwarding to udp:10.0.0.1
[resolver] external DNS udp:10.0.0.1 responded with NXDOMAIN for "foo."
[resolver] query foo. (A) from 172.18.0.5:39805, forwarding to udp:10.0.0.2
[resolver] external DNS udp:10.0.0.2 responded with NXDOMAIN for "foo."
[resolver] query foo. (A) from 172.18.0.5:54945, forwarding to udp:10.0.0.3
[resolver] read from DNS server failed, read udp 172.18.0.5:54945->10.0.0.3:53: i/o timeout
It seems to me that somehow this failure mode should(?)/could return one of the NXDOMAIN responses we previously received allowing the client to continue operating rather than hanging for extended periods of time as if all DNS servers were unreachable.
@thaJeztah if, on the off chance, you had some time to take a look at this it would be much appreciated.
From my debugging I believe this may be introduced in a86d2765b829fb122c70eea7a914d59a8fb1df4a
Hi there,
We discovered an issue today in how queries passed to an external DNS server are retried when an
NXDOMAIN
is received.Under normal circumstances, DNS works as expected. The NXDOMAIN is received by the client, resulting in the search domain being appended and the query retried.
However, when 10.0.0.3 goes offline, the container never receives the NXDOMAIN and thus never tries to resolve the query with the search domain.
We can see that the docker daemon tried 3 servers (expected, since the NXDOMAIN is not authoritative). It receives 2
NXDOMAIN
responses followed by a timeout. At this point we've hit our limit (sincemaxExtDNS = 3
) and we fall through without sending a responseIt seems to me that somehow this failure mode should(?)/could return one of the
NXDOMAIN
responses we previously received allowing the client to continue operating rather than hanging for extended periods of time as if all DNS servers were unreachable.