Open AkihiroSuda opened 3 years ago
I am running into the same problem, but with AAAA records instead. And in particular, it seems to break DNS resolution for IPv4-hosts for Alpine Linux containers started inside a docker:dind-rootless
container, since it seems musl considers that getting a NXDOMAIN response for an AAAA query means that the entire query should fail even though the A query returns a valid list of IP addresses.
I was going to open a new issue and found this one at the last minute, so here's my writeup/analysis for it:
The easiest way I have found to reproduce the issue is as follows:
Install rootful Docker using the official distribution and instructions, for example over Ubuntu Server 22.04
Run the docker:dind-rootless
image:
$ sudo docker run -d --name dind --privileged --env DOCKER_TLS_CERTDIR="" docker:24.0.2-dind-rootless --tls=false
Launch an Alpine Linux container inside it, and try to resolve an IPv4-only domain:
$ sudo docker exec -it dind env DOCKER_HOST=tcp://localhost:2375 docker run --rm -it alpine:3.18 wget http://ipv4.tlund.se -O /dev/null
wget: bad address 'ipv4.tlund.se'
, which indicates a DNS resolution failure. This also reproduces when using any other tools, such as e.g. curl
.Some further tests tell us more about the nature of the problem, and why I believe it's related to VPNKit:
Alpine Linux is necessary: The problem does not happen if you run the command inside a non-Alpine userspace such as Debian:
$ sudo docker exec -it dind env DOCKER_HOST=tcp://localhost:2375 docker run --rm -it debian:bullseye-slim sh -c 'apt-get update && apt-get install -y wget && wget http://ipv4.tlund.se -O /dev/null'
[...]
Connecting to ipv4.tlund.se (ipv4.tlund.se)|193.15.228.195|:80... connected.
[...]
Docker-in-Docker not necessary: It is not necessary to get into a full Docker-in-Docker scenario to run into the issue. Attempting to resolve the hostname with just rootlesskit as follows, also fails:
$ sudo docker exec -it dind rootlesskit --net=vpnkit wget https://ipv4.tlund.se -O /dev/null
[...]
wget: bad address 'ipv4.tlund.se'
[...]
Similarly, the cause of the problem is not that the test is running inside a docker:dind-rootless
container. I have set up an Alpine Linux VM, installed rootlesskit
and vpnkit
on it from this package), and the problem still reproduces.
VPNKit necessary:
Removing the --net=vpnkit
switch from the previous command makes it work:
$ sudo docker exec -it dind rootlesskit wget https://ipv4.tlund.se -O /dev/null
Connecting to ipv4.tlund.se (193.15.228.195:443)
[...]
Similarly, using slirp4netns makes it work:
$ sudo docker exec -it -u 0 dind apk add slirp4netns
[...]
$ sudo docker exec -it dind rootlesskit --net=slirp4netns wget https://ipv4.tlund.se -O /dev/null
Connecting to ipv4.tlund.se (193.15.228.195:443)
[...]
Fixed by using the --dns=/etc/resolv.conf:
Adding the --dns=/etc/resolv.conf
parameter to VPNKit to force "Upstream" instead of "Host" resolution fixes the problem:
$ cat >vpnkit_forward.sh <<EOF
#!/bin/sh
exec vpnkit --dns=/etc/resolv.conf "\$@"
EOF
$ sudo docker cp vpnkit_forward.sh dind:/vpnkit_forward.sh
$ sudo docker exec -it -u 0 dind chmod +x /vpnkit_forward.sh
$ sudo docker exec -it dind rootlesskit --net=vpnkit --vpnkit-binary=/vpnkit_forward.sh wget https://ipv4.tlund.se -O /dev/null
Connecting to ipv4.tlund.se (193.15.228.195:443)
[...]
Related to IPv4 hosts
The problem appears to be related to the fact that the host we are trying to resolve (in the example: ipv4.tlund.se) only has A (IPv4) records, but no AAAA records. Trying to resolve a host with both kinds of records does work:
$ sudo docker exec -it dind rootlesskit --net=vpnkit wget https://dual.tlund.se -O /dev/null
Connecting to dual.tlund.se (193.15.228.195:443)
[...]
I believe that the problem is that when you run an AAAA query for a domain without any AAAA records inside rootlesskit+vpnkit, you get an invalid NXDOMAIN response:
$ sudo docker exec -it -u 0 dind apk add bind-tools
$ sudo docker exec -it dind rootlesskit --net=vpnkit dig ipv4.tlund.se AAAA | grep status
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 60429
While if you run it without vpnkit or with slirp4netns, you get a NOERROR response:
$ sudo docker exec -it dind dig ipv4.tlund.se AAAA | grep status
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 54463
$ sudo docker exec -it dind rootlesskit --net=slirp4netns dig ipv4.tlund.se AAAA | grep status
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 41316
It appears that the musl DNS resolver will fail the resolution once it sees that NXDOMAIN response for the AAAA records, failing the entire resolution.
I have not yet had time to figure out why we're getting a NXDOMAIN response after we add VPNKit (or what the specs say about those weird cases), but at first glance it seems like it should return NOERROR instead.
At least for AAAA queries, the NXDOMAIN appears to come from those two lines: https://github.com/moby/vpnkit/blob/dc331cb22850be0cdd97c84a9cfecaf44a1afb6e/src/hostnet/hostnet_dns.ml#L449-L450, which seem to be turning a response with no answers into a NXDOMAIN.
VPNKit DNS server returns NXDOMAIN for SRV records
OTOH slirp4netns DNS works as expected:
VPNKit version: v0.4.0 RootlessKit version: v0.10.0
Originally reported by @hawicz in https://github.com/moby/libnetwork/issues/2574