siderolabs / talos

Talos Linux is a modern Linux distribution built for Kubernetes.
https://www.talos.dev
Mozilla Public License 2.0
6.98k stars 565 forks source link

really-long-record.dev.siderolabs.io doesn't get resolved on aws and azure runners. #8823

Closed DmitriyMV closed 6 months ago

DmitriyMV commented 6 months ago

This is really weird. When you try to run it locally with

~ > sudo -E talosctl cluster create \
--provisioner=qemu \
--cidr=172.20.0.0/24 \
--registry-mirror docker.io=http://172.20.0.1:5000 \
--registry-mirror k8s.gcr.io=http://172.20.0.1:5001 \
--registry-mirror quay.io=http://172.20.0.1:5002 \
--registry-mirror gcr.io=http://172.20.0.1:5003 \
--registry-mirror ghcr.io=http://172.20.0.1:5004 \
--registry-mirror 127.0.0.1:5005=http://172.20.0.1:5005 \
--install-image=127.0.0.1:5005/siderolabs/installer:<whatever-hash-you-have-in-your-local-docker> \
--controlplanes 1 \
--workers 1 \
--with-bootloader=false \
--skip-injecting-config \
--with-apply-config

it works and return a proper result.

~ > kubectl run alpine --image alpine -it -- ash
Warning: would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "alpine" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "alpine" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "alpine" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "alpine" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
If you don't see a command prompt, try pressing enter.
/ #  nslookup really-long-record.dev.siderolabs.io
Server:     10.96.0.10
Address:    10.96.0.10:53

Non-authoritative answer:

Non-authoritative answer:
Name:   really-long-record.dev.siderolabs.io
Address: 192.168.15.30
Name:   really-long-record.dev.siderolabs.io
Address: 192.168.15.6
Name:   really-long-record.dev.siderolabs.io
Address: 192.168.15.9
Name:   really-long-record.dev.siderolabs.io
Address: 192.168.15.31
Name:   really-long-record.dev.siderolabs.io
Address: 192.168.15.32
Name:   really-long-record.dev.siderolabs.io
Address: 192.168.15.4
Name:   really-long-record.dev.siderolabs.io
Address: 192.168.15.20
Name:   really-long-record.dev.siderolabs.io
Address: 192.168.15.0
Name:   really-long-record.dev.siderolabs.io
Address: 192.168.15.28
Name:   really-long-record.dev.siderolabs.io
Address: 192.168.15.18
Name:   really-long-record.dev.siderolabs.io
Address: 192.168.15.12
Name:   really-long-record.dev.siderolabs.io
Address: 192.168.15.13
Name:   really-long-record.dev.siderolabs.io
Address: 192.168.15.17
Name:   really-long-record.dev.siderolabs.io
Address: 192.168.15.15
Name:   really-long-record.dev.siderolabs.io
Address: 192.168.15.8
Name:   really-long-record.dev.siderolabs.io
Address: 192.168.15.33
Name:   really-long-record.dev.siderolabs.io
Address: 192.168.15.27
Name:   really-long-record.dev.siderolabs.io
Address: 192.168.15.29
Name:   really-long-record.dev.siderolabs.io
Address: 192.168.15.22
Name:   really-long-record.dev.siderolabs.io
Address: 192.168.15.5
Name:   really-long-record.dev.siderolabs.io
Address: 192.168.15.7
Name:   really-long-record.dev.siderolabs.io
Address: 192.168.15.19
Name:   really-long-record.dev.siderolabs.io
Address: 192.168.15.14
Name:   really-long-record.dev.siderolabs.io
Address: 192.168.15.11
Name:   really-long-record.dev.siderolabs.io
Address: 192.168.15.21
Name:   really-long-record.dev.siderolabs.io
Address: 192.168.15.26
Name:   really-long-record.dev.siderolabs.io
Address: 192.168.15.3
Name:   really-long-record.dev.siderolabs.io
Address: 192.168.15.2

But running it with forwardKubeDNSToHost: false and it stops working:

/ #  nslookup really-long-record.dev.siderolabs.io
Server:     10.96.0.10
Address:    10.96.0.10:53

Non-authoritative answer:

Non-authoritative answer:

dig still works tho

/ #  dig really-long-record.dev.siderolabs.io

; <<>> DiG 9.18.27 <<>> really-long-record.dev.siderolabs.io
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 36157
;; flags: qr rd ra; QUERY: 1, ANSWER: 34, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: bac172b75ceb2027 (echoed)
;; QUESTION SECTION:
;really-long-record.dev.siderolabs.io. IN A

;; ANSWER SECTION:
really-long-record.dev.siderolabs.io. 30 IN A   192.168.15.25
really-long-record.dev.siderolabs.io. 30 IN A   192.168.15.28
really-long-record.dev.siderolabs.io. 30 IN A   192.168.15.8
really-long-record.dev.siderolabs.io. 30 IN A   192.168.15.11
really-long-record.dev.siderolabs.io. 30 IN A   192.168.15.2
really-long-record.dev.siderolabs.io. 30 IN A   192.168.15.3
really-long-record.dev.siderolabs.io. 30 IN A   192.168.15.18
really-long-record.dev.siderolabs.io. 30 IN A   192.168.15.12
really-long-record.dev.siderolabs.io. 30 IN A   192.168.15.31
really-long-record.dev.siderolabs.io. 30 IN A   192.168.15.21
really-long-record.dev.siderolabs.io. 30 IN A   192.168.15.30
really-long-record.dev.siderolabs.io. 30 IN A   192.168.15.32
really-long-record.dev.siderolabs.io. 30 IN A   192.168.15.5
really-long-record.dev.siderolabs.io. 30 IN A   192.168.15.33
really-long-record.dev.siderolabs.io. 30 IN A   192.168.15.6
really-long-record.dev.siderolabs.io. 30 IN A   192.168.15.10
really-long-record.dev.siderolabs.io. 30 IN A   192.168.15.17
really-long-record.dev.siderolabs.io. 30 IN A   192.168.15.4
really-long-record.dev.siderolabs.io. 30 IN A   192.168.15.27
really-long-record.dev.siderolabs.io. 30 IN A   192.168.15.29
really-long-record.dev.siderolabs.io. 30 IN A   192.168.15.24
really-long-record.dev.siderolabs.io. 30 IN A   192.168.15.1
really-long-record.dev.siderolabs.io. 30 IN A   192.168.15.15
really-long-record.dev.siderolabs.io. 30 IN A   192.168.15.7
really-long-record.dev.siderolabs.io. 30 IN A   192.168.15.9
really-long-record.dev.siderolabs.io. 30 IN A   192.168.15.20
really-long-record.dev.siderolabs.io. 30 IN A   192.168.15.19
really-long-record.dev.siderolabs.io. 30 IN A   192.168.15.26
really-long-record.dev.siderolabs.io. 30 IN A   192.168.15.13
really-long-record.dev.siderolabs.io. 30 IN A   192.168.15.16
really-long-record.dev.siderolabs.io. 30 IN A   192.168.15.23
really-long-record.dev.siderolabs.io. 30 IN A   192.168.15.14
really-long-record.dev.siderolabs.io. 30 IN A   192.168.15.22
really-long-record.dev.siderolabs.io. 30 IN A   192.168.15.0

;; Query time: 272 msec
;; SERVER: 10.96.0.10#53(10.96.0.10) (UDP)
;; WHEN: Wed May 29 16:21:55 UTC 2024
;; MSG SIZE  rcvd: 621
nberlee commented 6 months ago

Please note the Question section on dig. it actually resolves with the explicit dot, really-long-record.dev.siderolabs.io.

This means it skips the 5 search domains of kubernetes. and the cluster coredns skips the search domains in /etc/resolv.conf which are set by the DHCP of AWS/azure.

In contrast nslookup does not the same thing. It first adds the 5 search domanis AND does AAAA lookup. then coredns add upstream the search domains and is also tasked with the AAAA records. Its no wonder it will timeout.

If you query with the implicit, explicit dot at the end the answer will return within the timeout window.

This is why I have autopath @kubernetes in my coredns configmap and pods verified and use an explicit forwarder (so it does not use the search from /etc/resolv.conf)

For more info see https://youtu.be/ZnW3k6m5AY8?feature=shared&t=844 (which will not cover that coredns adds to the problem to honor the hosts search domain)

DmitriyMV commented 6 months ago

Fixed in #8816