netbirdio / netbird

Connect your devices into a secure WireGuard®-based overlay network with SSO, MFA and granular access controls.
https://netbird.io
BSD 3-Clause "New" or "Revised" License
11.22k stars 517 forks source link

Unable to ping other peers on netbird network #1506

Open bmcgonag opened 9 months ago

bmcgonag commented 9 months ago

Describe the problem

I have setup a netbird self-hosted network with Authentik as the IdP.

I have added two linux devices and one iphone.

I try to ping from one linux machine to the other on it's netbird ip address.

netbird status -d on each linux machine shows the other as a peer, as well as the iphone as a peer that is currently offline.

I saw some other posts about similar issues where the person found their turn server config to be incorrect.

I used the site at https://webrtc.github.io/samples/src/content/peerconnection/trickle-ice/ to test my turn configuration and get the following:

Time    Type    Foundation  Protocol    Address     Port    Priority    URL (if present)    relayProtocol (if present)
0.003   host    0   udp dfaa8882-bbcf-61a7263e2e3c.local    40208   126 | 32512 | 255       
0.008   host    3   udp 2b841932-ae5b-3d03e55d8a5b.local    49310   126 | 32256 | 255       
0.008   host    6   tcp dfaa8882-288f-bbcf-61a7263e2e3c.local   9   125 | 32704 | 255       
0.009   host    7   tcp 2b841932-ae5b-3d03e55d8a5b.local    9   125 | 32448 | 255       
0.010   host    0   udp dfaa8882-61a7263e2e3c.local 43180   126 | 32512 | 254       
0.012   host    3   udp 2b841932-ae5b-3d03e55d8a5b.local    51760   126 | 32256 | 254       
0.013   host    6   tcp dfaa8882-288f-61a7263e2e3c.local    9   125 | 32704 | 254       
0.014   host    7   tcp 2b841932-62b0-3d03e55d8a5b.local    9   125 | 32448 | 254       
0.140   srflx   4   udp xx.xxx.xx.xxx   49310   100 | 32287 | 255       
0.141   relay   5   udp xxx.xxx.xxx.xxx 63425   5 | 32287 | 255     
0.166   Done

I believe everything is setup correctly, but still I am unable to ping the other machine successfully.

In the management.json file I also verified that the turn server credentials match those in the turnserver.conf file.

I have setup 1 extra group called personal, and added all three machines to it. I added an ACL for that group to allow traffic between the machines in the group and made sure it's enabled. Additionally, I have not removed the 'ALL' group, just to be able to compare and contrast having ALL enabled or disabled. No difference.

To Reproduce

Steps to reproduce the behavior:

  1. Setup Netbird on a self hosted installation.
  2. Set it up to use Authentik (I don't think this is the issue)
  3. install netbird clients on 2 linux machines.
  4. Add the machines to a group.
  5. create an ACL to allow the machines in the group to communicate.
  6. enable the ACL.
  7. try to ping one machine from the other.

Expected behavior

I would expect communication between the machines in a group being handled by an ACL to allow communication. At the very least I would expect the machines on the ALL group to be able to communicate.

Are you using NetBird Cloud?

Self-hosted

NetBird version

Server: Docker - version set to latest Clients: Linux Desktops - Fedora 39 - 0.25.4 Linux Dekstops - Ubuntu 23.10 - 0.25.5

NetBird status -d output: From the Fedora desktop:

Peers detail:
 brian-ub-studio-1.netbird.selfhosted:
  NetBird IP: 100.85.93.103
  Public key: ***************************************
  Status: Connected
  -- detail --
  Connection type: P2P
  Direct: true
  ICE candidate (Local/Remote): host/prflx
  Last connection update: 2024-01-29 14:14:21

 iphone.netbird.selfhosted:
  NetBird IP: 100.85.170.165
  Public key: ***************************************
  Status: Disconnected
  -- detail --
  Connection type: 
  Direct: false
  ICE candidate (Local/Remote): -/-
  Last connection update: 2024-01-29 14:53:52

Daemon version: 0.25.4
CLI version: 0.25.4
Management: Connected to https://my-net.netbird-server.com:33073
Signal: Connected to http://my-net.netbird-server.
![Screenshot from 2024-01-29 19-07-24](https://github.com/netbirdio/netbird/assets/7346620/72851509-a62d-4c9e-8b98-e4673ac52e32)
![Screenshot from 2024-01-29 19-07-44](https://github.com/netbirdio/netbird/assets/7346620/4f19af0a-5a04-41bb-b87f-753968684a23)
com:10000
FQDN: brian-fedora-lan-1.netbird.selfhosted
NetBird IP: 100.85.242.220/16
Interface type: Kernel
Peers count: 1/2 Connected

If applicable, add the `netbird status -d' command output.

Screenshots

If applicable, add screenshots to help explain your problem.

Additional context

Add any other context about the problem here.

bmcgonag commented 9 months ago

Additional information. I updated my fedora client to 0.25.5-1, and still see the same issue.

bmcgonag commented 9 months ago

I completely remade my setup using a new domain name, and still have the following:

  1. Each peer can see the other peers in the all group when doing netbird status -d.
  2. One peer can resolve the ipv4 address of the other peer when trying to ping by netbird hostname, but no ping is ever successful.
  3. Neither peer can successfully ping the other by ipv4 or hostname.

Watched the logs of the docker compose up -d when I started up the new system. Had no errors at all.

Everything appears to be communicating properly, except the clients can't seem to communicate with each other. No idea why.

Any help is greatly appreciated.

wisetux commented 9 months ago

Hello @bmcgonag, please confirm the VPC you are using to host NetBird server. Might be an issue with reachability to Coturn. Are you able to ping different hosts on same network using their NetBird hostname?

bmcgonag commented 9 months ago

I'm using Digital Ocean. I posted my coturn test results in the original message using Trickle-ice. I don't think that's the issue, but not 100% ceertain of that.

I am unable to ping the hosts by IPv4 or by Hostname. Any direction or help is greatly appreciated.

bmcgonag commented 9 months ago

also @wisetux the server I have setup is 1vCPU and 2GB RAM running Ubuntu 22.04 LTS server. Nothing else running on that server, just Netbird.

wisetux commented 9 months ago

Thank you for the info. The server specs should be fine as NetBird is very light on resources. However Trickle ICE output looks a little different. This is what I have:

Time    Type    Foundation  Protocol    Address                                   Port  Priority              URL (if present)                         relayProtocol (if present)
0.007   host    4226889391  udp         CLIENT LAN IP ADDRESS                     35803 126 | 32286 | 255
0.010   host    2058046622  udp         CLIENT ISP IPV6 ADDRESS                   49572 126 | 32552 | 255
0.051   srflx   1125687685  udp         CLIENT ISP IPV4 ADDRESS                   35803 100 | 32286 | 255     stun:netbird.DOMAIN.com:3478
0.098   relay   2009379810  udp         NETBIRD SERVER LAN IP ADDRESS             55349   2 | 32287 | 255     turn:netbird.DOMAIN.com:3478?transport=udp   udp
0.121   host    2235487287  tcp         CLIENT LAN IP ADDRESS                         9  90 | 32286 | 255
0.122   host    73707014    tcp         CLIENT ISP IPV6 ADDRESS                       9  90 | 32552 | 255
0.124   Done

Can you try connecting from a different network or a mobile Hotspot maybe?

bmcgonag commented 9 months ago

Ok, yeah I see how your's is different. Any idea what it might be @wisetux ? I have my own Coturn server setup that I use for Matrix, NextCloud, and others, but it uses "static-auth" not "lt-cred-mesh". Can Netbird do "static-auth"?

bmcgonag commented 9 months ago

Results when connected through mobile hotspot

Time    Type    Foundation  Protocol    Address     Port    Priority    URL (if present)    relayProtocol (if present)
0.007   host    0   udp 2b0139aa-xxxx-425c-bfe2-fad07cf3f11a.local  59790   126 | 32256 | 255       
0.010   host    3   udp dff059db-xxxx-4ecc-b1fe-48d649fd858e.local  58453   126 | 32512 | 255       
0.012   host    6   tcp 2b0139aa-149e-xxxx-bfe2-fad07cf3f11a.local  9   125 | 32448 | 255       
0.013   host    7   tcp dff059db-9369-xxxx-b1fe-48d649fd858e.local  9   125 | 32704 | 255       
0.018   host    0   udp 2b0139aa-xxxx-425c-bfe2-fad07cf3f11a.local  50283   126 | 32256 | 254       
0.019   host    3   udp dff059db-xxxx-4ecc-b1fe-48d649fd858e.local  57575   126 | 32512 | 254       
0.020   host    6   tcp 2b0139aa-xxxx-425c-bfe2-fad07cf3f11a.local  9   125 | 32448 | 254       
0.022   host    7   tcp dff059db-xxxx-4ecc-b1fe-48d649fd858e.local  9   125 | 32704 | 254       
0.173   srflx   1   udp 174.2xx.xxx.xxx 4351    100 | 32287 | 255       
0.173   relay   2   udp 206.xx.xx.xxx   61919   5 | 32287 | 255 
0.194   Done
wisetux commented 9 months ago

I'm not well versed with Coturn server setup and I use a dedicated instance just for Netbird. Maybe this issue might give you more info regarding static-auth configuration: https://github.com/netbirdio/netbird/issues/569

magixus commented 9 months ago

Do you have DNS resolv issue ? Maybe one of the ERROR below in /var/log/netbird/client.log

ERRO client/internal/dns/server.go:282: got an error while applying resolvconf configuration for wt0 interface, error: exit status 99
ERRO client/internal/dns/host_linux.go:99: got an error while checking systemd resolv conf mode, error: got an error getting property org.freedesktop.resolve1.Manager.ResolvConfMode: Unknown property or interface.
WARN client/internal/wgproxy/factory_linux.go:15: failed to initialize ebpf proxy, fallback to user space proxy: field NbXdpProg: program nb_xdp_prog: map .rodata: map create: read- and write-only maps not supported (requires >= v5.2)
ERRO client/internal/dns/server.go:282: unable to configure DNS for this peer using resolvconf manager without a nameserver group with all domains configured

full issue refered here: #1451

bmcgonag commented 9 months ago

No. Checked logs, and no errors shown. I have a few WARN, and a lot of INFO states, but no ERRORs logged.

Even tailed the logs while logging in, as well as trying to ping the peer after login.

tarocjsu commented 9 months ago

Same sympton here: root@docker219 ~# netbird status -d Peers detail: pve.netbird.selfhosted: NetBird IP: 100.86.4.26 Public key: dIuwdZzyZpSQPx64I7wo8uzl/su75PaNpklHVhZFkCw= Status: Connected -- detail -- Connection type: Relayed Direct: false ICE candidate (Local/Remote): relay/host ICE candidate endpoints (Local/Remote): 114.37.176.127:61298/192.168.1.2:61298 Last connection update: 2024-02-19 14:09:49 Last Wireguard handshake: 2024-02-19 14:32:53 Transfer status (received/sent) 1.1 KiB/3.7 KiB

d9e3486ac0e6.netbird.selfhosted: NetBird IP: 100.86.24.123 Public key: kMnnFpG4JtOASFHcGO3otQxKFAJQ7lDK1iNpkp9TOyo= Status: Disconnected -- detail -- Connection type: Relayed Direct: false ICE candidate (Local/Remote): relay/host ICE candidate endpoints (Local/Remote): 114.37.176.127:57500/192.168.1.236:57500 Last connection update: - Last Wireguard handshake: 2024-02-19 14:32:00 Transfer status (received/sent) 1.4 KiB/640 B

desktop-0d03977.netbird.selfhosted: NetBird IP: 100.86.71.168 Public key: BI3zBLxEDNOTo/ouFcrfx+nU8PAbfueTRWfPyUFgFEk= Status: Connected -- detail -- Connection type: P2P Direct: true ICE candidate (Local/Remote): host/srflx ICE candidate endpoints (Local/Remote): 192.168.10.219:51820/118.163.170.24:51820 Last connection update: 2024-02-19 14:09:49 Last Wireguard handshake: 2024-02-19 14:33:20 Transfer status (received/sent) 2.9 KiB/2.1 KiB

netbird.netbird.selfhosted: NetBird IP: 100.86.138.236 Public key: hgOPbz+D5cSiOmIdLbyjzMT85sojs8hGfe8r33/tYTY= Status: Connected -- detail -- Connection type: Relayed Direct: false ICE candidate (Local/Remote): relay/host ICE candidate endpoints (Local/Remote): 114.37.176.127:57500/192.168.1.236:57500 Last connection update: 2024-02-19 14:25:15 Last Wireguard handshake: 2024-02-19 14:32:00 Transfer status (received/sent) 1.4 KiB/640 B

pve-dell.netbird.selfhosted: NetBird IP: 100.86.139.76 Public key: s4KxhTaOhrZgrvi2WDeHDKwIRg2YmeBoNjNGOxrkeyE= Status: Connected -- detail -- Connection type: P2P Direct: true ICE candidate (Local/Remote): host/host ICE candidate endpoints (Local/Remote): 192.168.10.219:51820/192.168.10.3:51820 Last connection update: 2024-02-19 14:09:48 Last Wireguard handshake: 2024-02-19 14:31:06 Transfer status (received/sent) 2.9 KiB/2.8 KiB

Daemon version: 0.25.9 CLI version: 0.25.9 Management: Connected to https://netbird.tarosu.eu.org:443 Signal: Connected to https://netbird.tarosu.eu.org:443 Relays: [stun:netbird.tarosu.eu.org:3478] is Available [turn:netbird.tarosu.eu.org:3478?transport=udp] is Available FQDN: docker219.netbird.selfhosted NetBird IP: 100.86.194.133/16 Interface type: Kernel Peers count: 4/5 Connected

Only desktop-0d03977.netbird.selfhosted can ping netbird.netbird.selfhosted together, but cannot ping other peers, also other peers cannot ping those two node.

週一 14:18 C:\Users\S2306005

ping netbird.netbird.selfhosted

Ping netbird.netbird.selfhosted [100.86.138.236] (使用 32 位元組的資料): 回覆自 100.86.138.236: 位元組=32 時間=10ms TTL=64 回覆自 100.86.138.236: 位元組=32 時間=10ms TTL=64 回覆自 100.86.138.236: 位元組=32 時間=13ms TTL=64 回覆自 100.86.138.236: 位元組=32 時間=13ms TTL=64

100.86.138.236 的 Ping 統計資料: 封包: 已傳送 = 4,已收到 = 4, 已遺失 = 0 (0% 遺失), 大約的來回時間 (毫秒): 最小值 = 10ms,最大值 = 13ms,平均 = 11ms

週一 14:36 C:\Users\S2306005

ping pve-dell.netbird.selfhosted

Ping pve-dell.netbird.selfhosted [100.86.139.76] (使用 32 位元組的資料): 要求等候逾時。 要求等候逾時。 要求等候逾時。 要求等候逾時。

100.86.139.76 的 Ping 統計資料: 封包: 已傳送 = 4,已收到 = 0, 已遺失 = 4 (100% 遺失),

tarocjsu commented 9 months ago

root@netbird:~# netbird status Daemon version: 0.25.9 CLI version: 0.25.9 Management: Connected Signal: Connected Relays: 2/2 Available FQDN: netbird.netbird.selfhosted NetBird IP: 100.86.138.236/16 Interface type: Kernel Peers count: 4/5 Connected

root@netbird:~# ping docker219.netbird.selfhosted PING docker219.netbird.selfhosted (100.86.194.133) 56(84) bytes of data. ^C --- docker219.netbird.selfhosted ping statistics --- 17 packets transmitted, 0 received, 100% packet loss, time 16373ms

root@netbird:~# ping desktop-0d03977.netbird.selfhosted PING desktop-0d03977.netbird.selfhosted (100.86.71.168) 56(84) bytes of data. 64 bytes from 100.86.71.168: icmp_seq=1 ttl=128 time=17.7 ms 64 bytes from 100.86.71.168: icmp_seq=2 ttl=128 time=23.2 ms 64 bytes from 100.86.71.168: icmp_seq=3 ttl=128 time=19.4 ms 64 bytes from 100.86.71.168: icmp_seq=4 ttl=128 time=24.3 ms 64 bytes from 100.86.71.168: icmp_seq=5 ttl=128 time=32.5 ms ^C --- desktop-0d03977.netbird.selfhosted ping statistics --- 5 packets transmitted, 5 received, 0% packet loss, time 4006ms rtt min/avg/max/mdev = 17.671/23.421/32.476/5.136 ms

tarocjsu commented 9 months ago

ping hostname (FQDN) can been translate to the IP address, only use default ALL group, and Default all pass Access Control setting.

tarocjsu commented 9 months ago

Found root cause for my network environment, all cannot ping or been ping system network already install Tailscale daemon, after remove/uninstall the Tailscale daemon, cannot ping issue gone/solved.

rhinot commented 2 months ago

Hi All - I have the same issue:

These are new installs (no config other than setup keys) on physical linux, mac, and android devices. I'm wondering if I missed a step in setup.

@bmcgonag Did you ever resolve?

UPDATE: I updated all clients to 29.2, which came out 2 hours ago. While it resolved some errors from the logs, the issue remains.

Ping & traceroute (on first hop) immediately both timeout. The following is the only ERROR in my logs: 2024-09-12T16:54:05-04:00 ERRO signal/client/grpc.go:399: error while handling message of Peer [key: ] error: [wrongly addressed message ]

bmcgonag commented 2 months ago

@rhinot I found, eventually, that I had a tailscale client working with a headscale server on the same machines. So I disconnected the tailscale client, and when I did, Netbird client started working. Not sure why it was such an issue, as they should in theory be using separate virtual networks.

rhinot commented 2 months ago

Thanks for the update.

Any guidance on how you were able to debug? I'm not running tailscale, or any other VPN, on these devices, so I'm perplexed why they can't find routes to each other.

mrwsl commented 2 months ago

I was facing the same issue. The solution was to set up a DNS manually to get it working.

rhinot commented 2 months ago

@herrwusel thanks for the pointer.

My domains are resolving to IPs, but I used the DNS instructions to add Cloudflare anyway, just in case.

No dice.

Did you do anything different than add one of the generic providers?