k3d-io / k3d

Little helper to run CNCF's k3s in Docker
https://k3d.io/
MIT License
5.47k stars 462 forks source link

[BUG] DNS resolution for external domain names (Internet) not working on pods #1516

Closed gsfd2000 closed 1 month ago

gsfd2000 commented 2 months ago

What did you do

I created a default k3d cluster and deployed pods on it. The pods are not able to resolve external dns. I tried setting K3D_FIX_DNS to 0 and 1, both made no difference, this example used default.

** server can't find google.com: NXDOMAIN command terminated with exit code 1

error message in coredns pod logs: vagrant@ubuntu-10032023:~/testcluster$ kubectl logs coredns-76f668cf94-rxgvs -n kube-system [WARNING] No files matching import glob pattern: /etc/coredns/custom/.server .:53 [WARNING] No files matching import glob pattern: /etc/coredns/custom/.server [INFO] 127.0.0.1:38206 - 26358 "HINFO IN 2083597104883247646.1506387703770981152. udp 57 false 512" NXDOMAIN qr,aa,rd 132 0.000832127s [INFO] plugin/reload: Running configuration SHA512 = fd586de816c2c35b2d8a1c5ceb51dda557a14f33879cb49b6c6a115bc61f862d3cdad6881dd13478e0a77a6f5267199b6f43500da4783b193415947413fb64e3 CoreDNS-1.10.1 linux/amd64, go1.20, 055b2c3 [INFO] 10.42.0.8:48569 - 3595 "A IN google.com.default.svc.cluster.local. udp 54 false 512" NXDOMAIN qr,aa,rd 147 0.000183376s [INFO] 10.42.0.8:44383 - 41899 "A IN google.com.svc.cluster.local. udp 46 false 512" NXDOMAIN qr,aa,rd 139 0.000093384s [INFO] 10.42.0.8:45401 - 20902 "A IN google.com.cluster.local. udp 42 false 512" NXDOMAIN qr,aa,rd 135 0.000095355s [INFO] 10.42.0.8:41790 - 244 "A IN google.com. udp 28 false 512" NXDOMAIN qr,aa,rd 103 0.000072917s [WARNING] No files matching import glob pattern: /etc/coredns/custom/*.server

cluster dns points to coredns service: vagrant@ubuntu-10032023:~/testcluster$ kubectl exec -it dnsutils -- cat /etc/resolv.conf search default.svc.cluster.local svc.cluster.local cluster.local nameserver 10.43.0.10 options ndots:5

k3d cluster running on one server node container with lb vagrant@ubuntu-10032023:~/testcluster$ docker container ls CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES bcee24ec17ff ghcr.io/k3d-io/k3d-proxy:5.7.0 "/bin/sh -c nginx-pr…" 3 minutes ago Up 3 minutes 80/tcp, 0.0.0.0:38263->6443/tcp k3d-dnstest-serverlb b79f010fe5e5 rancher/k3s:v1.29.6-k3s1 "/bin/k3d-entrypoint…" 3 minutes ago Up 3 minutes k3d-dnstest-server-0

the coredns pod does seem to resolve correctly to the underlying docker network gateway ip 172.19.0.1 vagrant@ubuntu-10032023:~/testcluster$ kubectl debug -it coredns-66c56f4556-dgm46 -n kube-system --image registry.k8s.io/e2e-test-images/jessie-dnsutils:1.3 --target coredns -n kube-system Targeting container "coredns". If you don't see processes from this container it may be because the container runtime doesn't support this feature. Defaulting debug container name to debugger-v99vb. If you don't see a command prompt, try pressing enter. root@coredns-66c56f4556-dgm46:/# root@coredns-66c56f4556-dgm46:/# cat /etc/resolv.conf search eu.pg.com nameserver 172.19.0.1 options ndots:0

this IP also resolves.

vagrant@ubuntu-10032023:~/testcluster$ kubectl get nodes -owide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME k3d-dnstest4-server-0 Ready control-plane,master 20d v1.29.6+k3s1 172.19.0.2 K3s v1.29.6+k3s1 5.15.0-69-generic containerd://1.7.17-k3s1

the k3d docker node container resolves correctly vagrant@ubuntu-10032023:~/testcluster$ docker exec -it k3d-dnstest-server-0 nslookup google.com Server: 127.0.0.11 Address: 127.0.0.11:53 Non-authoritative answer:

Non-authoritative answer: Name: google.com Address: 142.250.185.238

`the underlying virtualbox machine (spawned by vagrant) also resolves correctly: the machine ```ning on a company laptop vagrant@ubuntu-10032023:~/testcluster$ nslookup google.com Server: 127.0.0.53 Address: 127.0.0.53#53

Non-authoritative answer: Name: google.com Address: 142.250.185.142 Name: google.com Address: 2a00:1450:4001:810::200e

as a test, I tried to change the forward configuration of the coredns configmap directly to docker gatway 172.19..01 or alternativel 8.8.8.8 but that did not change anything as well vagrant@ubuntu-10032023:~/testcluster$ kubectl get cm coredns -n kube-system -oyaml apiVersion: v1 data: Corefile: | .:53 { errors health ready kubernetes cluster.local in-addr.arpa ip6.arpa { pods insecure fallthrough in-addr.arpa ip6.arpa } hosts /etc/coredns/NodeHosts { ttl 60 reload 15s fallthrough } prometheus :9153 forward . 8.8.8.8 cache 30 loop reload loadbalance import /etc/coredns/custom/.override } import /etc/coredns/custom/.server Nodehosts: | 172.19.0.2 k3d-dnstest4-server-0 kind: ConfigMap metadata: annotations: .....


## What did you expect to happen
`I expect the nslookup command to run/resolve properly for external DNS in every cluster pod`

## Which OS & Architecture

vagrant@ubuntu-10032023:~/testcluster$ uname -a Linux ubuntu-10032023 5.15.0-69-generic #76-Ubuntu SMP Fri Mar 17 17:19:29 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux vagrant@ubuntu-10032023:~/testcluster$ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 22.04.2 LTS Release: 22.04 Codename: jammy


- output of `k3d runtime-info`

vagrant@ubuntu-10032023:~/testcluster$ k3d runtime-info

arch: x86_64 cgroupdriver: systemd cgroupversion: "2" endpoint: /var/run/docker.sock filesystem: extfs infoname: ubuntu-10032023 name: docker os: Ubuntu 22.04.2 LTS ostype: linux version: 27.0.3


## Which version of `k3d`

vagrant@ubuntu-10032023:~/testcluster$ k3d version k3d version v5.7.0 k3s version v1.29.6-k3s1 (default) ```

Which version of docker

Client: Docker Engine - Community
 Version:           27.0.3
 API version:       1.46
 Go version:        go1.21.11
 Git commit:        7d4bcd8
 Built:             Sat Jun 29 00:02:33 2024
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          27.0.3
  API version:      1.46 (minimum version 1.24)
  Go version:       go1.21.11
  Git commit:       662f78c
  Built:            Sat Jun 29 00:02:33 2024
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.7.18
  GitCommit:        ae71819c4f5e67bb4d5ae76a6b735f29cc25774e
 runc:
  Version:          1.7.18
  GitCommit:        v1.1.13-0-g58aa920
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0 

potentially related

https://github.com/k3d-io/k3d/issues/1515

gsfd2000 commented 1 month ago

there is a change which happened on coredns switching to usage of coredns-custom which seems to create the problem. Clusters spawned with k3d >=5.7.0 are using coredns-custom configmap which shoots the coredns based cluster pod external FQDN resolutions https://github.com/k3d-io/k3d/releases/tag/v5.7.0 https://github.com/k3d-io/k3d/compare/v5.6.3...v5.7.0 https://github.com/k3d-io/k3d/commit/71b5755ebdb3c02e7b82665e88f50e316adec311 when you remove the import entry from the coredns cm, it is working. The new coredns-custom configuration seems to have a hiccup somewhere, can someone pls take a look? When I downgrade on the same cluster to 5.6.3 and run the same cluster creation, I have no challenges.

gsfd2000 commented 1 month ago

I have seen that issues have already been reverted in > 5.7.0, hence closing.