netbirdio / netbird

Connect your devices into a single secure private WireGuard®-based mesh network with SSO/MFA and simple access controls.
https://netbird.io
BSD 3-Clause "New" or "Revised" License
9.81k stars 426 forks source link

Kubernetes, CoreDNS: Problem with DNS resolving of custom cluster domain name #2112

Open vaceslav opened 3 weeks ago

vaceslav commented 3 weeks ago

Description: I'm experiencing an issue with NetBird cloud when trying to access custom domain names for Kubernetes services.

Setup:

For testing, I deployed an NGINX service, accessible at: nginx-service.default.svc.cluster.local.

A NetBird peer is deployed inside the Kubernetes cluster, and the connection is established.

NetBird Configuration:

With this setup, I can access the NGINX service from my local computer without issues.

Problem: I have multiple clusters and want to access all of them via NetBird. To achieve this, I added a rewrite rule in CoreDNS:

CoreDNS Config ``` .:53 { errors health ready rewrite name substring dev.compute.local svc.cluster.local kubernetes cluster.local in-addr.arpa ip6.arpa { pods insecure fallthrough in-addr.arpa ip6.arpa } hosts /etc/coredns/NodeHosts { ttl 60 reload 15s fallthrough } prometheus :9153 forward . /etc/resolv.conf cache 30 loop reload loadbalance import /etc/coredns/custom/*.override } import /etc/coredns/custom/*.server ```

This makes the service available under both:

I updated the DNS configuration in the NetBird admin UI to include dev.compute.local as a second match domain.

Issue: The domain nginx-service.default.dev.compute.local is not reachable from my local computer.

Environment:

Commands Output:

scutil --dns ``` DNS configuration resolver #1 search domain[0] : netbird.cloud nameserver[0] : 172.18.0.1 if_index : 14 (en0) flags : Request A records reach : 0x00020002 (Reachable,Directly Reachable Address) resolver #2 domain : netbird.cloud nameserver[0] : 100.121.255.254 port : 53 flags : Supplemental, Request A records reach : 0x00000002 (Reachable) order : 101600 resolver #3 domain : svc.cluster.local nameserver[0] : 100.121.255.254 port : 53 flags : Supplemental, Request A records reach : 0x00000002 (Reachable) order : 102401 resolver #4 domain : dev.compute.local nameserver[0] : 100.121.255.254 port : 53 flags : Supplemental, Request A records reach : 0x00000002 (Reachable) order : 102400 resolver #5 domain : local options : mdns timeout : 5 flags : Request A records reach : 0x00000000 (Not Reachable) order : 300000 resolver #6 domain : 254.169.in-addr.arpa options : mdns timeout : 5 flags : Request A records reach : 0x00000000 (Not Reachable) order : 300200 resolver #7 domain : 8.e.f.ip6.arpa options : mdns timeout : 5 flags : Request A records reach : 0x00000000 (Not Reachable) order : 300400 resolver #8 domain : 9.e.f.ip6.arpa options : mdns timeout : 5 flags : Request A records reach : 0x00000000 (Not Reachable) order : 300600 resolver #9 domain : a.e.f.ip6.arpa options : mdns timeout : 5 flags : Request A records reach : 0x00000000 (Not Reachable) order : 300800 resolver #10 domain : b.e.f.ip6.arpa options : mdns timeout : 5 flags : Request A records reach : 0x00000000 (Not Reachable) order : 301000 DNS configuration (for scoped queries) resolver #1 nameserver[0] : 172.18.0.1 if_index : 14 (en0) flags : Scoped, Request A records reach : 0x00020002 (Reachable,Directly Reachable Address) ```
dscacheutil -q host -a name nginx-service.default.svc.cluster.local ``` name: nginx-service.default.svc.cluster.local ip_address: 10.43.140.140 ```
dscacheutil -q host -a name nginx-service.default.dev.compute.local ``` ```
dig @10.43.0.10 nginx-service.default.svc.cluster.local ``` ; <<>> DiG 9.10.6 <<>> @10.43.0.10 nginx-service.default.svc.cluster.local ; (1 server found) ;; global options: +cmd ;; Got answer: ;; WARNING: .local is reserved for Multicast DNS ;; You are currently testing what happens when an mDNS query is leaked to DNS ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 46857 ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; WARNING: recursion requested but not available ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;nginx-service.default.svc.cluster.local. IN A ;; ANSWER SECTION: nginx-service.default.svc.cluster.local. 5 IN A 10.43.140.140 ;; Query time: 331 msec ;; SERVER: 10.43.0.10#53(10.43.0.10) ;; WHEN: Mon Jun 10 18:29:51 CEST 2024 ;; MSG SIZE rcvd: 123 ```
dig @10.43.0.10 nginx-service.default.dev.compute.local ``` ; <<>> DiG 9.10.6 <<>> @10.43.0.10 nginx-service.default.dev.compute.local ; (1 server found) ;; global options: +cmd ;; Got answer: ;; WARNING: .local is reserved for Multicast DNS ;; You are currently testing what happens when an mDNS query is leaked to DNS ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 56850 ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; WARNING: recursion requested but not available ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;nginx-service.default.dev.compute.local. IN A ;; ANSWER SECTION: nginx-service.default.svc.cluster.local. 5 IN A 10.43.140.140 ;; Query time: 1514 msec ;; SERVER: 10.43.0.10#53(10.43.0.10) ;; WHEN: Mon Jun 10 18:31:20 CEST 2024 ;; MSG SIZE rcvd: 123 ```

So you can see that a direct request to the nameserver delivers the correct answer!!!!!

Expected Behavior: The address nginx-service.default.dev.compute.local should be accessible from my local computer.

vaceslav commented 3 weeks ago

UPDATE It looks like a NetBird + Mac OS issue. Under Windows everything works. The domain nginx-service.default.dev.compute.local is reachable and ping works.

Any suggestions?