netbirdio / netbird

Connect your devices into a single secure private WireGuard®-based mesh network with SSO/MFA and simple access controls.
https://netbird.io
BSD 3-Clause "New" or "Revised" License
9.87k stars 428 forks source link

DNS Name Resolution of internal hostnames works only a few hours #2069

Open jogrie opened 1 month ago

jogrie commented 1 month ago

Describe the problem

Hi there,

i want to use Netbird to connect my homelab to my cloud server.

I am using internal Domain Names and have problems to resolve these names.

the setup looks like this

cloud server -> netbird connection -> home server -> netbird route -> dns server

on the home server i have ntfy.sh running so i can send me notifications when e.g. a backup is done.

my acl look like this

cloud server <-> home server - allow ping cloud server -> home server - allow http / https

my network routes look like this

name: dns dst: 192.168.178.105/32 router: home server

name: reverse proxy dst 192.168.178.100/32 (home server ip) router: home server

my dns config for the cloud server 192.168.178.105 All Domains

Both servers are debian 12 servers at the moment the connection is only used once at night to send a notification for a finished backup

when connecting the cloud to the home server dns resolution works fine i can ping the home server with the domain name ntfy.home.example.com

curl -d "Test" ntfy.home.example.com/test works fine

now the problem

after a few hours / days the dns resolution is not working anymore

if i ping the ip 192.168.178.100 it works fine if i ping the hostname - unknown hostname

seems like the dns resolution is gone

in the netbird status -d it says Nameserver available (see below)

To Reproduce

Steps to reproduce the behavior:

see above

Expected behavior

I expectet that the dns resolution is working not only a few hours

Are you using NetBird Cloud?

Yes

NetBird version

0.27.3 on both sides

NetBird status -d output:

netbird status -d
Peers detail:
 homeserver.netbird.cloud:
  NetBird IP: 100.74.168.78
  Public key: 9d/Y7q0AmiJ70OdL7CRGuiP5NkOk3SFOevkDTHqdjyY=
  Status: Connected
  -- detail --
  Connection type: P2P
  Direct: true
  ICE candidate (Local/Remote): host/prflx
  ICE candidate endpoints (Local/Remote): XXX.XXX.XXX.XXX:51820/XXX.XXX.XXX.XXX:51820
  Last connection update: 2024-05-26 21:26:03
  Last WireGuard handshake: 2024-05-28 21:12:06
  Transfer status (received/sent) 366.8 KiB/247.9 KiB
  Quantum resistance: false
  Routes: 192.168.178.100/32, 192.168.178.105/32
  Latency: 11.045915ms

Daemon version: 0.27.3
CLI version: 0.27.3
Management: Connected to https://api.netbird.io:443
Signal: Connected to https://signal.netbird.io:443
Relays: 
  [stun:stun.netbird.io:5555] is Available
  [turns:turn.netbird.io:443?transport=tcp] is Available
Nameservers: 
  [192.168.178.105:53] for [.] is Available
FQDN: cloudserver.netbird.cloud
NetBird IP: 100.74.138.188/16
Interface type: Kernel
Quantum resistance: false
Routes: -
Peers count: 1/1 Connected
bcmmbaga commented 1 month ago

Hello @jogrie, could you update to version 0.27.10 test it again, and check if you still encounter the same issue?

jogrie commented 1 month ago

Hi, i just updated to the version 0.27.10. i will give you an update in a few days if the error occours again.

jogrie commented 1 month ago

Hi, yesterday i just updated to the version 0.27.10.

at first moment the dns resolution worked fine but at night, there was no notification.

same error as described above. i can ping the machine with the internal ip but the name resolution is not working.

here the client.log from the cloud server


2024-05-30T12:01:25+02:00 WARN [error: read udp 100.74.138.188:54773->192.168.247.105:53: i/o timeout, upstream: 192.168.247.105:53] client/internal/dns/upstream.go:101: got an error while connecting to upstream
2024-05-30T12:01:25+02:00 ERRO client/internal/dns/upstream.go:133: all queries to the upstream nameservers failed with timeout
2024-05-30T12:01:30+02:00 WARN [error: read udp 100.74.138.188:34181->192.168.247.105:53: i/o timeout, upstream: 192.168.247.105:53] client/internal/dns/upstream.go:101: got an error while connecting to upstream
2024-05-30T12:01:30+02:00 ERRO client/internal/dns/upstream.go:133: all queries to the upstream nameservers failed with timeout
2024-05-30T15:37:29+02:00 WARN management/client/grpc.go:162: disconnected from the Management service but will retry silently. Reason: rpc error: code = Internal desc = stream terminated by RST_STREAM with error code: INTERNAL_ERROR
2024-05-30T15:37:30+02:00 INFO management/client/grpc.go:147: connected to the Management Service stream
2024-05-30T15:37:30+02:00 INFO client/internal/acl/manager.go:52: ACL rules processed in: 4.510397ms, total rules count: 7
2024-05-30T15:44:20+02:00 WARN [error: read udp 100.74.138.188:44916->192.168.247.105:53: i/o timeout, upstream: 192.168.247.105:53] client/internal/dns/upstream.go:101: got an error while connecting to upstream
2024-05-30T15:44:20+02:00 ERRO client/internal/dns/upstream.go:133: all queries to the upstream nameservers failed with timeout
2024-05-30T15:44:25+02:00 WARN [error: read udp 100.74.138.188:33462->192.168.247.105:53: i/o timeout, upstream: 192.168.247.105:53] client/internal/dns/upstream.go:101: got an error while connecting to upstream
2024-05-30T15:44:25+02:00 ERRO client/internal/dns/upstream.go:133: all queries to the upstream nameservers failed with timeout
2024-05-30T17:33:09+02:00 WARN signal/client/grpc.go:171: disconnected from the Signal service but will retry silently. Reason: rpc error: code = Unavailable desc = closing transport due to: connection error: desc = "error reading from server: EOF", received prior goaway: code: NO_ERROR, debug data: "server_shutting_down"
2024-05-30T17:33:10+02:00 INFO signal/client/grpc.go:158: connected to the Signal Service stream
2024-05-30T18:10:07+02:00 WARN [upstream: 192.168.247.105:53, error: read udp 100.74.138.188:57893->192.168.247.105:53: i/o timeout] client/internal/dns/upstream.go:101: got an error while connecting to upstream
2024-05-30T18:10:07+02:00 ERRO client/internal/dns/upstream.go:133: all queries to the upstream nameservers failed with timeout
2024-05-30T18:10:12+02:00 WARN [error: read udp 100.74.138.188:50975->192.168.247.105:53: i/o timeout, upstream: 192.168.247.105:53] client/internal/dns/upstream.go:101: got an error while connecting to upstream
2024-05-30T18:10:12+02:00 ERRO client/internal/dns/upstream.go:133: all queries to the upstream nameservers failed with timeout
2024-05-30T18:10:17+02:00 WARN [upstream: 192.168.247.105:53, error: read udp 100.74.138.188:37623->192.168.247.105:53: i/o timeout] client/internal/dns/upstream.go:101: got an error while connecting to upstream
2024-05-30T18:10:17+02:00 ERRO client/internal/dns/upstream.go:133: all queries to the upstream nameservers failed with timeout
2024-05-30T23:00:55+02:00 WARN signal/client/grpc.go:171: disconnected from the Signal service but will retry silently. Reason: rpc error: code = Internal desc = server closed the stream without sending trailers
2024-05-30T23:00:59+02:00 INFO signal/client/grpc.go:158: connected to the Signal Service stream
2024-05-31T03:09:46+02:00 WARN client/internal/routemanager/client.go:154: the network 192.168.247.105/32 has not been assigned a routing peer as no peers from the list [9d/Y7q0AmiJ70OdL7CRGuiP5NkOk3SFOevkDTHqdjyY=] are currently connected
2024-05-31T03:09:46+02:00 WARN client/internal/routemanager/client.go:154: the network 192.168.247.100/32 has not been assigned a routing peer as no peers from the list [9d/Y7q0AmiJ70OdL7CRGuiP5NkOk3SFOevkDTHqdjyY=] are currently connected
2024-05-31T03:09:48+02:00 ERRO client/internal/peer/conn.go:630: failed signaling candidate to the remote peer 9d/Y7q0AmiJ70OdL7CRGuiP5NkOk3SFOevkDTHqdjyY= no connection to signal
2024-05-31T03:09:48+02:00 ERRO client/internal/peer/conn.go:630: failed signaling candidate to the remote peer 9d/Y7q0AmiJ70OdL7CRGuiP5NkOk3SFOevkDTHqdjyY= no connection to signal
2024-05-31T03:09:49+02:00 INFO client/internal/peer/conn.go:388: connected to peer 9d/Y7q0AmiJ70OdL7CRGuiP5NkOk3SFOevkDTHqdjyY=, endpoint address: 87.186.125.102:51820
2024-05-31T03:09:49+02:00 INFO client/internal/routemanager/client.go:165: new chosen route is cn0aq4bl0ubs73elq6s0:cmpojjqfic3c73eq9mtg with peer 9d/Y7q0AmiJ70OdL7CRGuiP5NkOk3SFOevkDTHqdjyY= with score 2.988673 for network 192.168.247.105/32
2024-05-31T03:09:49+02:00 INFO client/internal/routemanager/client.go:165: new chosen route is cmpp0cqfic3c73eq9ni0:cmpojjqfic3c73eq9mtg with peer 9d/Y7q0AmiJ70OdL7CRGuiP5NkOk3SFOevkDTHqdjyY= with score 2.988673 for network 192.168.247.100/32
2024-05-31T03:37:30+02:00 WARN management/client/grpc.go:162: disconnected from the Management service but will retry silently. Reason: rpc error: code = Internal desc = server closed the stream without sending trailers
2024-05-31T03:37:31+02:00 INFO management/client/grpc.go:147: connected to the Management Service stream
2024-05-31T03:37:31+02:00 INFO client/internal/acl/manager.go:52: ACL rules processed in: 9.845559ms, total rules count: 7```
arthur-trt commented 1 month ago

I had this problem on Linux because of NetworkManager which rewrite /etc/resolv.conf multiple time per day. You could try to disable it : https://askubuntu.com/a/1140591

jogrie commented 1 month ago

Hi Arthur,

thanks for your suggestion. But it seems that NetworkManager is not active

systemctl status NetworkManager
Unit NetworkManager.service could not be found