netbirdio / netbird

Connect your devices into a secure WireGuard®-based overlay network with SSO, MFA and granular access controls.
https://netbird.io
BSD 3-Clause "New" or "Revised" License
11.3k stars 518 forks source link

DNS not respect network routes #2653

Open xan-it opened 2 months ago

xan-it commented 2 months ago

Describe the problem

We have multiple network routes with a nameserver entry to the same domain in each . If we select only one network route, netbird should use the nameerver from that network because it is the only one reachable from the peer. It seems that netbird use the first available nameserver for the requested domain even if this is not available from the peer.

To Reproduce

Steps to reproduce the behavior:

  1. Create two or more peers in different networks and create network routes to this networks. e.g.: peer1 with local IP 172.20.0.1 with network route 172.20.0.0/16 with name lan1 peer2 with local IP 172.21.0.1 with network route 172.21.0.0/16 with name lan2 peer3 with local IP 172.22.0.1 with network route 172.22.0.0/16 with name lan3

  2. Install DNS server which can resolve test.testdomain.com in each network and create a nameserver entry in netbord for the same domain in each network. e.g.: entry 1 with nameserver 172.20.0.253 with the domain testdomain.com entry 2 with nameserver 172.21.0.253 with the domain testdomain.com entry 3 with nameserver 172.22.0.253 with the domain testdomain.com

  3. On the client select only one network after another and try to resolve test.testdomain.com. Only in one network there is a correct result. In the other networks you will get an error that the name can't resolve.

Expected behavior

netbird should use the nameserver reachable by the peer.

Are you using NetBird Cloud?

no, self-hosted

NetBird version

server 0.29.3 client 0.29.4

NetBird status -dA output:

Peers detail: lan1-pve1.netbird.selfhosted: NetBird IP: 100.97.19.116 Public key: ... Status: Connected -- detail -- Connection type: Relayed ICE candidate (Local/Remote): -/- ICE candidate endpoints (Local/Remote): -/- Relay server address: rels://netbird.anon-RZnkf.domain:443 Last connection update: 9 seconds ago Last WireGuard handshake: 8 seconds ago Transfer status (received/sent) 3.5 KiB/12.3 KiB Quantum resistance: false Routes: - Latency: 0s

lan2-pve1.netbird.selfhosted: NetBird IP: 100.97.131.97 Public key: ... Status: Connected -- detail -- Connection type: Relayed ICE candidate (Local/Remote): -/- ICE candidate endpoints (Local/Remote): -/- Relay server address: rels://netbird.anon-RZnkf.domain:443 Last connection update: 4 minutes, 50 seconds ago Last WireGuard handshake: 6 minutes, 12 seconds ago Transfer status (received/sent) 92 B/6.3 KiB Quantum resistance: false Routes: - Latency: 37.261ms

lan3-pve1.netbird.selfhosted: NetBird IP: 100.97.140.40 Public key: ... Status: Connected -- detail -- Connection type: P2P ICE candidate (Local/Remote): srflx/srflx ICE candidate endpoints (Local/Remote): 198.51.100.0:51820/198.51.100.1:51820 Relay server address: rels://netbird.anon-RZnkf.domain:443 Last connection update: 42 minutes, 15 seconds ago Last WireGuard handshake: 1 minute, 15 seconds ago Transfer status (received/sent) 4.4 KiB/12.9 KiB Quantum resistance: false Routes: - Latency: 146.6974ms

OS: windows/amd64 Daemon version: 0.29.4 CLI version: 0.29.4 Management: Connected to https://netbird.anon-RZnkf.domain:443 Signal: Connected to https://netbird.anon-RZnkf.domain:443 Relays: [stun:netbird.anon-RZnkf.domain:3478] is Available [turn:netbird.anon-RZnkf.domain:3478?transport=udp] is Available [rels://netbird.anon-RZnkf.domain:443] is Available Nameservers: [172.20.0.253:53] for [anon-STYq8.domain] is Available [172.21.0.253:53] for [anon-STYq8.domain] is Available [172.22.0.253:53] for [anon-STYq8.domain] is Available FQDN: peer9.netbird.selfhosted NetBird IP: 100.97.207.78/16 Interface type: Userspace Quantum resistance: false Routes: - Peers count: 3/3 Connected

Do you face any (non-mobile) client issues?

Screenshots

Additional context

lixmal commented 2 months ago

Hi @xan-it,

for this to work you will have to put the nameservers for the same domain in one entry here:

image

xan-it commented 2 months ago

@lixmal : thank you for your workaround but there I can only add 3 nameservers (I have more then 10) and I can't control the nameservers with distribution groups.

mgarces commented 2 months ago

@xan-it can't you add multiple custom Nameservers? I don't understand what you mean "I can't control the nameservers with distribution groups"; what is that you are really trying to achieve?

xan-it commented 2 months ago

I will explain the situation more precise.

Network setup: We have multiple customers with a network each build identically: There is a linux server with a netbird client used as a routing peer to the network and every network has a nameserver which can resolve the same domain "ourinternaldomain.io" to the specific local IPs in that network (so we can e.g. use https://dashboard.ourinternaldomain.io in every network for our convenience). But the networks have different IP-subnets and the nameservers and webservers have different IPs.

Service technican: There are several service technican on our site with the rights to support specific customers. These was solved using groups: E.g. technican1 can support customer1 and customer2. Technican2 can support customer3 and customer4. That works fine: the technican can only see the networks of the customers he belongs to. The same way we go with the nameservers belonging to the network. They also assigned to the same group.

Goal: Now the technican wants to support a customer. He starts the netbird client UI and select only the one network of the customer he wants to support. So he not accidentally works in the wrong network. Now he wants to open the Dashboard in the browser: https://dashboard.ourinternaldomain.io. But the whole name resolution only works if the first available nameserver (see netbird status above) is on the network the technican selected. If the first available nameserver is on a network which is not selected we get a "can't resolve".

It seems that netbird identifies the availability of the nameserver not respecting the selected networks in the client. Netbird should use the first available nameserver on the selected network.

xan-it commented 1 month ago

with netbird 0.30.2 and 0.30.3 it is not possible at all to resolv a private DNS name.

netbird status -dA Peers detail: pve1.netbird.selfhosted: NetBird IP: 100.97.140.40 Public key: MpJCG9jb6u7JzHfusOv4vsW7x2dxO4RCWzBuj6Ikt0g= Status: Connected -- detail -- Connection type: P2P ICE candidate (Local/Remote): host/srflx ICE candidate endpoints (Local/Remote): 127.0.0.1:51820/198.51.100.0:51820 Relay server address: rels://netbird.anon-0cQw1.domain:443 Last connection update: 10 minutes, 54 seconds ago Last WireGuard handshake: 50 seconds ago Transfer status (received/sent) 699.3 KiB/687.5 KiB Quantum resistance: false Routes: 172.21.0.0/16 Latency: 147.408ms

OS: windows/amd64 Daemon version: 0.30.3 CLI version: 0.30.3 Management: Connected to https://netbird.anon-0cQw1.domain:443 Signal: Connected to https://netbird.anon-0cQw1.domain:443 Relays: [stun:netbird.anon-0cQw1.domain:3478] is Available [turn:netbird.anon-0cQw1.domain:3478?transport=udp] is Available [rels://netbird.anon-0cQw1.domain:443] is Available Nameservers: [172.21.0.110:53] for [anon-XBgZv.domain] is Available FQDN: vm-pc489.netbird.selfhosted NetBird IP: 100.97.233.71/16 Interface type: Userspace Quantum resistance: false Routes: - Peers count: 1/1 Connected

netbird routes ls Available Routes:

resolv name Resolve-DnsName -Name web.ouranondomain.io Resolve-DnsName : web.ouranondomain.io : The DNS name does not exist

if I explicitally use the DNS server than I get a correct result: Resolve-DnsName -Name web.ouranondomain.io -Server 172.21.0.110 ... web.ouranondomain.io A 0 Answer 172.21.0.110

Here is the client.log 2024-10-25T12:57:14+02:00 INFO client/cmd/service_controller.go:24: starting Netbird service 2024-10-25T12:57:14+02:00 INFO client/cmd/service_controller.go:66: started daemon server: 127.0.0.1:41731 2024-10-25T12:57:23+02:00 INFO client/internal/connect.go:111: starting NetBird client version 0.30.3 on windows/amd64 2024-10-25T12:57:23+02:00 INFO client/internal/connect.go:240: connecting to the Relay service(s): rels://netbird.anon-0cQw1.domain:443 2024-10-25T12:57:23+02:00 INFO relay/client/picker.go:66: try to connecting to relay server: rels://netbird.anon-0cQw1.domain:443 2024-10-25T12:57:23+02:00 INFO [relay: rels://netbird.anon-0cQw1.domain:443] relay/client/client.go:166: create new relay connection: local peerID: vO0n05Sh6Q1wkrWm0hb/R/IKjRh63eEMD+3VYJ391Ck=, local peer hashedID: sha-hjaDKUOkl+IQI3YnBEWvYUZphaADVmW+Mj0Ur7c4soA= 2024-10-25T12:57:23+02:00 INFO [relay: rels://netbird.anon-0cQw1.domain:443] relay/client/client.go:172: connecting to relay server 2024-10-25T12:57:24+02:00 INFO [relay: rels://netbird.anon-0cQw1.domain:443] relay/client/client.go:189: relay connection established 2024-10-25T12:57:24+02:00 INFO relay/client/picker.go:84: connected to Relay server: rels://netbird.anon-0cQw1.domain:443 2024-10-25T12:57:24+02:00 INFO relay/client/picker.go:58: chosen home Relay server: rels://netbird.anon-0cQw1.domain:443 2024-10-25T12:57:24+02:00 INFO client/iface/wgproxy/factory_usp.go:15: WireGuard Proxy Factory will produce bind proxy 2024-10-25T12:57:24+02:00 INFO client/internal/routemanager/manager.go:144: Routing setup complete 2024-10-25T12:57:24+02:00 INFO client/iface/device/device_windows.go:59: create tun interface 2024-10-25T12:57:25+02:00 ERRO [relay: rels://netbird.anon-0cQw1.domain:443] relay/client/client.go:434: peer not found: sha-Un3Xf0yzG+MMlWakRgYtNxtta9doiYiw5cLrBJDrjFg= 2024-10-25T12:57:25+02:00 INFO client/internal/peer/guard/sr_watcher.go:106: reconnected to Signal or Relay server 2024-10-25T12:57:25+02:00 INFO signal/client/grpc.go:149: connected to the Signal Service stream 2024-10-25T12:57:25+02:00 INFO client/internal/connect.go:268: Netbird engine started, the IP is: 100.97.233.71/16 2024-10-25T12:57:28+02:00 ERRO [relay: rels://netbird.anon-0cQw1.domain:443] relay/client/client.go:434: peer not found: sha-Un3Xf0yzG+MMlWakRgYtNxtta9doiYiw5cLrBJDrjFg= 2024-10-25T12:57:28+02:00 INFO management/client/grpc.go:155: connected to the Management Service stream 2024-10-25T12:57:28+02:00 WARN client/internal/engine.go:597: running SSH server is not permitted 2024-10-25T12:57:28+02:00 INFO client/internal/acl/manager.go:56: ACL rules processed in: 130.2µs, total rules count: 2 2024-10-25T12:57:28+02:00 WARN client/internal/routemanager/client.go:160: The network [172.21.0.0/16] has not been assigned a routing peer as no peers from the list [MpJCG9jb6u7JzHfusOv4vsW7x2dxO4RCWzBuj6Ikt0g=] are currently connected 2024-10-25T12:57:28+02:00 INFO client/internal/dns/host_windows.go:149: added 2 match domains to the state. Domain list: [.ouranondomain.io .netbird.selfhosted] 2024-10-25T12:57:28+02:00 INFO client/internal/dns/host_windows.go:172: updated the search domains in the registry with 1 domains. Domain list: [netbird.selfhosted] 2024-10-25T12:57:29+02:00 WARN client/internal/dns/upstream.go:196: probing upstream nameserver 172.21.0.110:53: read udp 192.9.201.92:59496->172.21.0.110:53: i/o timeout 2024-10-25T12:57:29+02:00 WARN client/internal/dns/upstream.go:275: Upstream resolving is Disabled for 30s 2024-10-25T12:57:29+02:00 INFO [nameservers: [{172.21.0.110 udp 53}]] client/internal/dns/server.go:512: Temporarily deactivating nameservers group due to timeout 2024-10-25T12:57:29+02:00 INFO client/internal/dns/host_windows.go:149: added 1 match domains to the state. Domain list: [.netbird.selfhosted] 2024-10-25T12:57:29+02:00 INFO client/internal/dns/host_windows.go:172: updated the search domains in the registry with 1 domains. Domain list: [netbird.selfhosted] 2024-10-25T12:57:29+02:00 INFO [relay: rels://netbird.anon-0cQw1.domain:443] relay/client/client.go:218: open connection to peer: sha-Un3Xf0yzG+MMlWakRgYtNxtta9doiYiw5cLrBJDrjFg= 2024-10-25T12:57:29+02:00 INFO [peer: MpJCG9jb6u7JzHfusOv4vsW7x2dxO4RCWzBuj6Ikt0g=] client/internal/peer/conn.go:436: created new wgProxy for relay connection: 127.1.140.40:51820 2024-10-25T12:57:29+02:00 INFO [peer: MpJCG9jb6u7JzHfusOv4vsW7x2dxO4RCWzBuj6Ikt0g=] client/internal/peer/conn.go:465: start to communicate with peer via relay 2024-10-25T12:57:29+02:00 WARN client/internal/routemanager/client.go:130: peer MpJCG9jb6u7JzHfusOv4vsW7x2dxO4RCWzBuj6Ikt0g= has 0 latency 2024-10-25T12:57:29+02:00 INFO client/internal/routemanager/client.go:171: New chosen route is crq2otfnh4hs73avdc3g with peer MpJCG9jb6u7JzHfusOv4vsW7x2dxO4RCWzBuj6Ikt0g= with score 0.000000 for network [172.21.0.0/16] 2024-10-25T12:57:29+02:00 WARN client/internal/routemanager/client.go:130: peer MpJCG9jb6u7JzHfusOv4vsW7x2dxO4RCWzBuj6Ikt0g= has 0 latency 2024-10-25T12:57:30+02:00 INFO [peer: MpJCG9jb6u7JzHfusOv4vsW7x2dxO4RCWzBuj6Ikt0g=] client/internal/peer/conn.go:320: set ICE to active connection 2024-10-25T12:57:30+02:00 WARN client/internal/routemanager/client.go:130: peer MpJCG9jb6u7JzHfusOv4vsW7x2dxO4RCWzBuj6Ikt0g= has 0 latency 2024-10-25T12:57:32+02:00 INFO client/internal/dns/upstream.go:252: upstreams [172.21.0.110:53] are responsive again. Adding them back to system 2024-10-25T12:57:32+02:00 INFO client/internal/dns/host_windows.go:149: added 2 match domains to the state. Domain list: [.ouranondomain.io .netbird.selfhosted] 2024-10-25T12:57:32+02:00 INFO client/internal/dns/host_windows.go:172: updated the search domains in the registry with 1 domains. Domain list: [netbird.selfhosted] 2024-10-25T12:57:32+02:00 INFO [peer: MpJCG9jb6u7JzHfusOv4vsW7x2dxO4RCWzBuj6Ikt0g=] client/internal/peer/guard/guard.go:84: start reconnect loop...