netbirdio / netbird

Connect your devices into a secure WireGuard®-based overlay network with SSO, MFA and granular access controls.
https://netbird.io
BSD 3-Clause "New" or "Revised" License
10.64k stars 474 forks source link

[bug][selfhosted] - Local DNS resolution not possible when netbird client is working #2336

Open rihards-simanovics opened 1 month ago

rihards-simanovics commented 1 month ago

Describe the problem

all DNS queries to 127.0.0.53:53 fail with a timeout. related to issue https://github.com/netbirdio/netbird/issues/2186

To Reproduce

Steps to reproduce the behavior:

  1. spin up a VPS with Ubuntu Server 22.04.4 LTS
  2. update all packages
  3. install cli client
  4. connect to the management server
  5. attempt to run

Expected behavior

when running the nslookup on google.com this should come up:

Server:         127.0.0.53
Address:        127.0.0.53#53

Non-authoritative answer:
Name:   google.com
Address: <some ip closest to you>
Name:   google.com
Address: <some ip closest to you>

Actual Behaviour

All queries to root DNS fail with a timeout

nslookup localhost
;; communications error to 127.0.0.53#53: timed out
;; communications error to 127.0.0.53#53: timed out
;; communications error to 127.0.0.53#53: timed out
;; no servers could be reached

Are you using NetBird Cloud?

negative - all selfhosted.

NetBird managemnet server version: unknown - latest as of 26th july 24

NetBird client version: 0.28.6

NetBird status -d output:

OS: linux/amd64
Daemon version: 0.28.6
CLI version: 0.28.6
Management: Connected to https://vpn-server-domain.anon-CFOEs.domain:443
Signal: Connected to https://vpn-server-domain.anon-CFOEs.domain:443
Relays:
  [stun:vpn-server-domain.anon-CFOEs.domain:3478] is Unavailable, reason: dial: failed to listen: dial: dial udp: lookup vpn-server-domain.anon-CFOEs.domain on 127.0.0.53:53: read udp 127.0.0.1:40225->127.0.0.53:53: i/o timeout
  [turn:vpn-server-domain.anon-CFOEs.domain:3478?transport=udp] is Unavailable, reason: create client: lookup vpn-server-domain.anon-CFOEs.domain on 127.0.0.53:53: read udp 127.0.0.1:46372->127.0.0.53:53: i/o timeout
Nameservers:
  [127.0.0.53:53] for [.] is Available
  [8.8.8.8:53, 8.8.4.4:53] for [.] is Available
FQDN: gws-uk-1.netbird.selfhosted
NetBird IP: 100.90.79.155/16
Interface type: Kernel
Quantum resistance: false
Routes: -
Peers count: 2/4 Connected

Screenshots

no screenshots please see outputs above

Additional context

Running on Ubuntu Server 22.04.4 LTS, with Plesk Obsidian 18.0.62 Update #2 (Web Host Edition) and DNS BIND.

related to issue #2186

rihards-simanovics commented 1 month ago

What I've also noticed after some time of tinkering, is that if the peer is added to a group, and that group is assigned a DNS NS of 127.0.0.53:53 it appears to work though only until connection to management server is reset - don't know whether it just resets the networking and hence why it starts working but after a couple of resets (set state to off and on of NS in dashboard) it appears to begin working.

Also, any time connection to management server is dropped, all (of 10 servers) are fine but this one (Ubuntu 22.04.4 server) with the usual fail to connect to a root DNS resolution server.

Under normal use case the connection would be more stable, but we are testing the resilience of the clients and their ability to communicate with other existing peers even if management server is down, fortunately this is the only server affected, unfortunately this server is our load balancer 🥲 for the other vHosts on other servers.

Marcus1Pierce commented 1 month ago

@rihards-simanovics Is your Ubuntu 22.04.4 server have DNS Server? If so, i have same problem. Looks like netbird will create dns port on it's ip in every peers. If I check on linux peers using command sudo lsof -i -P -n | grep :53 every peers has this netbird 748 root 23u IPv4 21571 0t0 UDP netbird-ip:53 results.

In my DNS Server, it breaks DNS and i can't resolve to my DNS Server. I have to add my dns server to group and add to Disable DNS Management in DNS Settings for my DNS Server peers.

rihards-simanovics commented 1 month ago

If I check on linux peers using command sudo lsof -i -P -n | grep :53 every peers has this netbird 748 root 23u IPv4 21571 0t0 UDP netbird-ip:53 results.

Yes, as mentioned in my original issue desc. I do indeed have a DNS server on that machine, named to be precise.

Just run the command sudo lsof -i -P -n | grep :53, and no, it doesn't seem to completly override the named service, which is the DNS server, not even root DNS - but that's likely due to named service already binding to it, I'm sure that on next VPN Man. server brownout a similar issue will happen.

Looking at the print out it looks as though named service does eventually bind to the root DNS and everything returns to normal and Netbird settles for 127.0.0.153:53.

/usr/sbin 1176998                           amavis   17u  IPv4 281883878      0t0  UDP 127.0.0.1:58359->127.0.0.53:53
/usr/sbin 1177005                           amavis   17u  IPv4 281947707      0t0  UDP 127.0.0.1:35339->127.0.0.53:53
/usr/sbin 1177262                           amavis   17u  IPv4 282512343      0t0  UDP 127.0.0.1:56948->127.0.0.53:53
netbird   1222403                             root   24u  IPv4 249471041      0t0  UDP 127.0.0.153:53
netbird   1222403                             root   25u  IPv4 322688715      0t0  UDP 127.0.0.1:42917->127.0.0.53:53
netbird   1222403                             root   31u  IPv4 307964250      0t0  UDP *:53335
/usr/sbin 1368247                           amavis   17u  IPv4 282508753      0t0  UDP 127.0.0.1:26319->127.0.0.53:53
named     1585814                             bind    6u  IPv4 249458218      0t0  UDP 100.90.79.155:53
named     1585814                             bind   25u  IPv4 219208391      0t0  UDP 127.0.0.1:53
named     1585814                             bind   26u  IPv4 219208392      0t0  UDP 127.0.0.1:53
named     1585814                             bind   27u  IPv4 219208393      0t0  TCP 127.0.0.1:53 (LISTEN)
named     1585814                             bind   28u  IPv4 219208394      0t0  TCP 127.0.0.1:53 (LISTEN)
named     1585814                             bind   29u  IPv4 219208395      0t0  UDP ext.ipv4.of.server:53
named     1585814                             bind   30u  IPv4 219208396      0t0  UDP ext.ipv4.of.server:53
named     1585814                             bind   31u  IPv4 219208397      0t0  TCP ext.ipv4.of.server:53 (LISTEN)
named     1585814                             bind   32u  IPv4 219208398      0t0  TCP ext.ipv4.of.server:53 (LISTEN)
named     1585814                             bind   33u  IPv4 219208399      0t0  UDP 172.17.0.1:53
named     1585814                             bind   34u  IPv4 219208400      0t0  UDP 172.17.0.1:53
named     1585814                             bind   35u  IPv4 219208401      0t0  TCP 172.17.0.1:53 (LISTEN)
named     1585814                             bind   36u  IPv4 219208402      0t0  TCP 172.17.0.1:53 (LISTEN)
named     1585814                             bind   41u  IPv4 249458219      0t0  UDP 100.90.79.155:53
named     1585814                             bind   42u  IPv4 249458220      0t0  TCP 100.90.79.155:53 (LISTEN)
named     1585814                             bind   43u  IPv4 249458221      0t0  TCP 100.90.79.155:53 (LISTEN)
named     1585814                             bind   45u  IPv6 219208411      0t0  UDP [::1]:53
named     1585814                             bind   46u  IPv6 219208412      0t0  UDP [::1]:53
named     1585814                             bind   49u  IPv6 219208415      0t0  UDP [external:ipv6:of:server]:53
named     1585814                             bind   50u  IPv6 219208416      0t0  UDP [external:ipv6:of:server]:53
named     1585814                             bind   51u  IPv6 219208417      0t0  TCP [external:ipv6:of:server]:53 (LISTEN)
named     1585814                             bind   52u  IPv6 219208418      0t0  TCP [external:ipv6:of:server]:53 (LISTEN)
named     1585814                             bind   53u  IPv6 219208419      0t0  UDP [external:ipv6:of:server]:53
named     1585814                             bind   54u  IPv6 219208420      0t0  UDP [external:ipv6:of:server]:53
named     1585814                             bind   55u  IPv6 219208421      0t0  TCP [external:ipv6:of:server]:53 (LISTEN)
named     1585814                             bind   56u  IPv6 219208422      0t0  TCP [external:ipv6:of:server]:53 (LISTEN)
named     1585814                             bind   57u  IPv6 219208423      0t0  UDP [external:ipv6:of:server]:53
named     1585814                             bind   58u  IPv6 219208424      0t0  UDP [external:ipv6:of:server]:53
named     1585814                             bind   59u  IPv6 219208425      0t0  TCP [external:ipv6:of:server]:53 (LISTEN)
named     1585814                             bind   60u  IPv6 219208426      0t0  TCP [external:ipv6:of:server]:53 (LISTEN)
named     1585814                             bind   61u  IPv6 219208427      0t0  UDP [external:ipv6:of:server]:53
named     1585814                             bind   62u  IPv6 219208428      0t0  UDP [external:ipv6:of:server]:53
named     1585814                             bind   63u  IPv6 219208429      0t0  TCP [external:ipv6:of:server]:53 (LISTEN)
named     1585814                             bind   64u  IPv6 219208430      0t0  TCP [external:ipv6:of:server]:53 (LISTEN)
named     1585814                             bind   65u  IPv6 219208431      0t0  UDP [fe80::1:3cff:feda:2296]:53
named     1585814                             bind   66u  IPv6 219208432      0t0  UDP [fe80::1:3cff:feda:2296]:53
named     1585814                             bind   67u  IPv6 219208433      0t0  TCP [fe80::1:3cff:feda:2296]:53 (LISTEN)
named     1585814                             bind   68u  IPv6 219208434      0t0  TCP [fe80::1:3cff:feda:2296]:53 (LISTEN)
systemd-r 1602236                  systemd-resolve   13u  IPv4 249770538      0t0  UDP 127.0.0.53:53
systemd-r 1602236                  systemd-resolve   14u  IPv4 249770539      0t0  TCP 127.0.0.53:53 (LISTEN)
systemd-r 1602236                  systemd-resolve   15u  IPv6 249770540      0t0  UDP [::1]:53
systemd-r 1602236                  systemd-resolve   16u  IPv6 249770541      0t0  TCP [::1]:53 (LISTEN)
/usr/sbin 1729524                           amavis   17u  IPv4 282509853      0t0  UDP 127.0.0.1:15604->127.0.0.53:53
/usr/sbin 1734785                           amavis   17u  IPv4 282519156      0t0  UDP 127.0.0.1:23851->127.0.0.53:53
/usr/sbin 1743217                           amavis   17u  IPv4 282555818      0t0  UDP 127.0.0.1:51948->127.0.0.53:53
/usr/sbin 1811148                           amavis   17u  IPv4 282648189      0t0  UDP 127.0.0.1:40822->127.0.0.53:53
/usr/sbin 1811171                           amavis   17u  IPv4 282639305      0t0  UDP 127.0.0.1:52933->127.0.0.53:53
/usr/sbin 1811245                           amavis   16u  IPv4 283277039      0t0  UDP 127.0.0.1:50352->127.0.0.53:53

My main point of using VPN is to create a secure network layer for my servers to communicate without the need for https, which requires a lot of setups on each server, and since Plesk does it in a few clicks, making Plesk server a reverse proxy has been my solution. Up until today 1am we've been using centralised VPN Pritunl, which is based on OpenVPN, but we found that it was way too scary to rely on, as instead of just one load balancer being the point of failure, we also have the VPN server. So, when I heard of overlay VPN and P2P connectivity with WireGuard, as well as almost 0 reliance on the manager server being up, I immediately made the decision to transition. Was really straight forward thankfully.

Marcus1Pierce commented 1 month ago

Command sudo lsof -i -P -n | grep :53 is just for check what process using port 53. From what your result from command sudo lsof -i -P -n | grep :53 above, is that when your dns is working or not?

rihards-simanovics commented 1 month ago

Command sudo lsof -i -P -n | grep :53 is just for check what process using port 53. From what your result from command sudo lsof -i -P -n | grep :53 above, is that when your dns is working or not?

yes, this is the working state. I'm not going to try and break it now as it's 9am in London and people are waking up but I can test in about 15 hrs to see what it's like when the Root DNS is not working well.

Marcus1Pierce commented 1 month ago

Ok then. Just to clarify, is your others 10 servers have dns server too or not?

I don't know if this fix your problem or not, but you can try add your dns server to group like nodns and from netbird dashboard, Go to DNS > DNS Settings and add nodns group on that. But this will "break netbird dns" and let your dns server running. (This is from what i try and make my dns server still working)

rihards-simanovics commented 1 month ago

yes there is one other, which we use as a fallback, it also uses named service, here is the command output:

named     2966849            bind    6u  IPv4 9421747      0t0  UDP ext.ipv4.of.server:53
named     2966849            bind   26u  IPv4 9055427      0t0  UDP 127.0.0.1:53
named     2966849            bind   27u  IPv4 9055428      0t0  UDP 127.0.0.1:53
named     2966849            bind   28u  IPv4 9055429      0t0  TCP 127.0.0.1:53 (LISTEN)
named     2966849            bind   30u  IPv4 9055430      0t0  TCP 127.0.0.1:53 (LISTEN)
named     2966849            bind   32u  IPv4 9421750      0t0  UDP ext.ipv4.of.server:53
named     2966849            bind   33u  IPv4 9421751      0t0  TCP ext.ipv4.of.server:53 (LISTEN)
named     2966849            bind   34u  IPv4 9421752      0t0  TCP ext.ipv4.of.server:53 (LISTEN)
named     2966849            bind   35u  IPv6 9421784      0t0  UDP [ext:ipv6:of:server]:53
named     2966849            bind   36u  IPv6 9421785      0t0  UDP [ext:ipv6:of:server]:53
named     2966849            bind   37u  IPv6 9422183      0t0  TCP [ext:ipv6:of:server]:53 (LISTEN)
named     2966849            bind   38u  IPv6 9422184      0t0  TCP [ext:ipv6:of:server]:53 (LISTEN)
named     2966849            bind   40u  IPv6 9055439      0t0  UDP [::1]:53
named     2966849            bind   41u  IPv6 9055440      0t0  UDP [::1]:53
named     2966849            bind   42u  IPv6 9055441      0t0  TCP [::1]:53 (LISTEN)
named     2966849            bind   43u  IPv6 9055442      0t0  TCP [::1]:53 (LISTEN)
named     2966849            bind   48u  IPv6 9055447      0t0  UDP [fe80::250:56ff:fe3d:c4f8]:53
named     2966849            bind   49u  IPv6 9055448      0t0  UDP [fe80::250:56ff:fe3d:c4f8]:53
named     2966849            bind   50u  IPv6 9055449      0t0  TCP [fe80::250:56ff:fe3d:c4f8]:53 (LISTEN)
named     2966849            bind   51u  IPv6 9055450      0t0  TCP [fe80::250:56ff:fe3d:c4f8]:53 (LISTEN)
named     2966849            bind   59u  IPv4 9423238      0t0  UDP 100.90.4.66:53
named     2966849            bind   63u  IPv4 9423239      0t0  UDP 100.90.4.66:53
named     2966849            bind   64u  IPv4 9423240      0t0  TCP 100.90.4.66:53 (LISTEN)
named     2966849            bind   65u  IPv4 9423241      0t0  TCP 100.90.4.66:53 (LISTEN)
netbird   3091556            root   22u  IPv4 9423420      0t0  UDP 127.0.0.153:53
netbird   3091556            root   29u  IPv4 9449030      0t0  UDP *:53437
systemd-r 3091801 systemd-resolve   13u  IPv4 9421763      0t0  UDP 127.0.0.53:53
systemd-r 3091801 systemd-resolve   14u  IPv4 9421764      0t0  TCP 127.0.0.53:53 (LISTEN)

it's also Ubuntu 22.04.4 LTS but strangely it doesn't have the same problem. I'll do more testing in 15 hrs on dev servers once traffic reduces to a minimum, it doesn't seem to affect operation of most service but just some that use DNS to find the IP of DB servers.

Marcus1Pierce commented 1 month ago

Can you share your resolv config on /etc/resolv.conf on your server that have dns server? Is it overwritten by netbird?

rihards-simanovics commented 1 month ago

getting this on both servers:

# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
# 127.0.0.53 is the systemd-resolved stub resolver.
# run "systemd-resolve --status" to see details about the actual nameservers.

nameserver 127.0.0.53
search netbird.selfhosted
Marcus1Pierce commented 1 month ago

Can you check what in /etc/resolv.conf.original.netbird? And can you check netbird logs for error and warning on /var/log/netbird/client.log? Maybe others can solve your problem.

I think i can't solve your problem but maybe you can try this.

I don't know if this fix your problem or not, but you can try add your dns server to group like nodns and from netbird dashboard, Go to DNS > DNS Settings and add nodns group on that. But this will "break netbird dns" and let your dns server running. (This is from what i try and make my dns server still working)

ashish9433 commented 3 weeks ago

Even i have exactly same issue, my local (custom) DNS is not getting resolving after install the netbird agent on ubuntu machine

rihards-simanovics commented 3 weeks ago

Funny enough the issue appeared to "resolve" on its own, don't know if it's actually resolved or whether something got changed but I can say with absolute certainty that the management server software hasn't been updated, only the clients. One other thing I did is configure the is the Access control, and completely disabled the "All" group. Now the "Servers" group only has access to other servers, everything else is locked down.

ashish9433 commented 3 weeks ago

@rihards-simanovics - were u able to solve this issue out with any work arounds?

rihards-simanovics commented 3 weeks ago

the only logical solution is to create a DNS server policy that allows the use of 127.0.0.53:53. Also, I didn't see this affecting more modern OS' such as ubuntu 24.04 that said the ones I had are not DNS servers so I can't say for sure yet.