netbirdio / netbird

Connect your devices into a single secure private WireGuard®-based mesh network with SSO/MFA and simple access controls.
https://netbird.io
BSD 3-Clause "New" or "Revised" License
9.83k stars 425 forks source link

Netbird should connect to peers before setting up DNS #2002

Open Thunderbottom opened 1 month ago

Thunderbottom commented 1 month ago

Describe the problem

In the latest version, netbird tries to resolve DNS before connecting to the peers. This causes the DNS resolution to fail in case where the DNS being used is a private DNS behind a routing peer. This further causes netbird to wait for the DNS resolution to timeout before connecting to peers on the network, and hence, it takes at least 15 seconds in our case to connect to the first peer.

After the peer connects, the DNS resolution works perfectly fine. But this delay in most cases is unbearable and causes usability issues for a lot of people.

Logs:

May 16 22:31:13 hades netbird[1405]: 2024-05-16T22:31:13+05:30 INFO signal/client/grpc.go:158: connected to the Signal Service stream
May 16 22:31:15 hades netbird[1405]: 2024-05-16T22:31:15+05:30 WARN [error: read udp 192.168.69.100:59667->192.168.0.2:53: i/o timeout, upstream: 192.168.0.2:53] client/internal/dns/upstream.go:102: got an error while connecting to upstream
May 16 22:31:15 hades netbird[1405]: 2024-05-16T22:31:15+05:30 ERRO client/internal/dns/upstream.go:134: all queries to the upstream nameservers failed with timeout
May 16 22:31:15 hades netbird[1405]: 2024-05-16T22:31:15+05:30 WARN [error: read udp 192.168.69.100:41074->192.168.0.2:53: i/o timeout, upstream: 192.168.0.2:53] client/internal/dns/upstream.go:102: got an error while connecting to upstream
May 16 22:31:15 hades netbird[1405]: 2024-05-16T22:31:15+05:30 ERRO client/internal/dns/upstream.go:134: all queries to the upstream nameservers failed with timeout
May 16 22:31:15 hades netbird[1405]: 2024-05-16T22:31:15+05:30 WARN [error: read udp 192.168.69.100:37390->192.168.0.2:53: i/o timeout, upstream: 192.168.0.2:53] client/internal/dns/upstream.go:102: got an error while connecting to upstream
May 16 22:31:15 hades netbird[1405]: 2024-05-16T22:31:15+05:30 ERRO client/internal/dns/upstream.go:134: all queries to the upstream nameservers failed with timeout
May 16 22:31:15 hades netbird[1405]: 2024-05-16T22:31:15+05:30 WARN client/internal/dns/upstream.go:265: Upstream resolving is Disabled for 30s
May 16 22:31:15 hades netbird[1405]: 2024-05-16T22:31:15+05:30 INFO [nameservers: [{192.168.0.2 udp 53}]] client/internal/dns/server.go:504: Temporarily deactivating nameservers group due to timeout
May 16 22:31:15 hades netbird[1405]: 2024-05-16T22:31:15+05:30 WARN [error: read udp 192.168.69.100:59543->192.168.0.2:53: i/o timeout, upstream: 192.168.0.2:53] client/internal/dns/upstream.go:102: got an error while connecting to upstream
May 16 22:31:15 hades netbird[1405]: 2024-05-16T22:31:15+05:30 ERRO client/internal/dns/upstream.go:134: all queries to the upstream nameservers failed with timeout
May 16 22:31:15 hades netbird[1405]: 2024-05-16T22:31:15+05:30 WARN [error: read udp 192.168.69.100:56609->192.168.0.2:53: i/o timeout, upstream: 192.168.0.2:53] client/internal/dns/upstream.go:102: got an error while connecting to upstream
May 16 22:31:15 hades netbird[1405]: 2024-05-16T22:31:15+05:30 ERRO client/internal/dns/upstream.go:134: all queries to the upstream nameservers failed with timeout
May 16 22:31:15 hades netbird[1405]: 2024-05-16T22:31:15+05:30 WARN [upstream: 192.168.0.2:53, error: read udp 192.168.69.100:45333->192.168.0.2:53: i/o timeout] client/internal/dns/upstream.go:102: got an error while connecting to upstream
May 16 22:31:15 hades netbird[1405]: 2024-05-16T22:31:15+05:30 ERRO client/internal/dns/upstream.go:134: all queries to the upstream nameservers failed with timeout
May 16 22:31:15 hades netbird[1405]: 2024-05-16T22:31:15+05:30 INFO client/internal/dns/resolvconf_linux.go:73: added 2 search domains. Search list: [local.netbird lan]
May 16 22:31:16 hades netbird[1405]: 2024-05-16T22:31:16+05:30 WARN [error: read udp 192.168.69.100:46766->192.168.0.2:53: i/o timeout, upstream: 192.168.0.2:53] client/internal/dns/upstream.go:102: got an error while connecting to upstream
May 16 22:31:16 hades netbird[1405]: 2024-05-16T22:31:16+05:30 ERRO client/internal/dns/upstream.go:134: all queries to the upstream nameservers failed with timeout
May 16 22:31:16 hades netbird[1405]: 2024-05-16T22:31:16+05:30 WARN [error: read udp 192.168.69.100:57562->192.168.0.2:53: i/o timeout, upstream: 192.168.0.2:53] client/internal/dns/upstream.go:102: got an error while connecting to upstream
May 16 22:31:16 hades netbird[1405]: 2024-05-16T22:31:16+05:30 ERRO client/internal/dns/upstream.go:134: all queries to the upstream nameservers failed with timeout
May 16 22:31:17 hades netbird[1405]: 2024-05-16T22:31:17+05:30 INFO management/client/grpc.go:147: connected to the Management Service stream
May 16 22:31:17 hades netbird[1405]: 2024-05-16T22:31:17+05:30 WARN client/internal/routemanager/client.go:154: the network 192.168.0.0/19 has not been assigned a routing peer as no peers from the list [<LIST>] are currently connected
May 16 22:31:18 hades netbird[1405]: 2024-05-16T22:31:18+05:30 INFO client/internal/routemanager/client.go:165: new chosen route is <ROUTE> with peer <PEER-ID> with score 2.974409 for network 192.168.0.2/32
May 16 22:31:18 hades netbird[1405]: 2024-05-16T22:31:18+05:30 INFO client/internal/dns/upstream.go:241: upstreams [192.168.0.2:53] are responsive again. Adding them back to system

In the logs it took a few seconds to connect, but usually on netbird up, this takes at least 10-15 seconds to connect.

To Reproduce

Steps to reproduce the behavior:

  1. Set up a private DNS on netbird using routing peer.
  2. Connect to netbird.
  3. Notice that netbird tries to resolve DNS and fails before trying to connect to the peers.
  4. See error.

Expected behavior

The DNS resolution should take place after the peer connections are initialized. There's no need for netbird to replace and resolve DNS before connecting to peers.

Are you using NetBird Cloud?

Self-hosted NetBird's control plane.

NetBird version

netbird version: 0.27.7

mlsmaycon commented 1 month ago

Thanks for opening this bug @Thunderbottom , the current behavior is the following:

We configure DNS and test it right the way, it should fail faster for this initial test. Decreasing timeout will help but not setting it up is the best approach.