netbirdio / netbird

Connect your devices into a secure WireGuard®-based overlay network with SSO, MFA and granular access controls.
https://netbird.io
BSD 3-Clause "New" or "Revised" License
10.78k stars 486 forks source link

linux routing is not working agian #1974

Open lfarkas opened 4 months ago

lfarkas commented 4 months ago

even after restart my client i can see this is in the log:

2024-05-13T12:47:11+02:00 WARN client/internal/routemanager/client.go:154: the network 192.168.0.0/16 has not been assigned a routing peer as no peers from the list [FfiyZKMquYILabBxOquw/jXEuTjhBq6tUvBEPdV3ckY= hCDjKQBW9TBwsZigTRXxvVzpAYE+ZqDHBol4sOSUMl0=] are currently connected
2024-05-13T12:47:11+02:00 WARN client/internal/routemanager/client.go:154: the network 10.30.0.0/24 has not been assigned a routing peer as no peers from the list [1u25Mrocd2aMv88fUgRnKmM1caynzX+bGTzThCZ3CnE=] are currently connected
2024-05-13T12:47:11+02:00 WARN client/internal/routemanager/client.go:154: the network 10.20.0.0/24 has not been assigned a routing peer as no peers from the list [1u25Mrocd2aMv88fUgRnKmM1caynzX+bGTzThCZ3CnE=] are currently connected
2024-05-13T12:47:11+02:00 INFO client/internal/dns/systemd_linux.go:144: adding 4 search domains and 0 match domains. Search list: [int.vidux.hu vidux.internal szeged.vidux.hu netbird.cloud] , Match list: []
2024-05-13T12:47:11+02:00 INFO client/internal/acl/manager.go:52: ACL rules processed in: 3.021225ms, total rules count: 2
2024-05-13T12:47:13+02:00 WARN client/internal/dns/upstream.go:185: probing upstream nameserver 10.30.0.1:53: read udp 10.1.251.86:53348->10.30.0.1:53: i/o timeout
2024-05-13T12:47:13+02:00 WARN client/internal/dns/upstream.go:185: probing upstream nameserver 192.168.208.1:53: read udp 10.6.6.2:59381->192.168.208.1:53: i/o timeout
2024-05-13T12:47:13+02:00 WARN client/internal/dns/upstream.go:265: Upstream resolving is Disabled for 30s
2024-05-13T12:47:13+02:00 INFO [nameservers: [{192.168.208.1 udp 53}]] client/internal/dns/server.go:504: Temporarily deactivating nameservers group due to timeout
2024-05-13T12:47:13+02:00 WARN client/internal/dns/upstream.go:265: Upstream resolving is Disabled for 30s
2024-05-13T12:47:13+02:00 INFO client/internal/dns/systemd_linux.go:144: adding 2 search domains and 0 match domains. Search list: [szeged.vidux.hu netbird.cloud] , Match list: []
2024-05-13T12:47:13+02:00 INFO [nameservers: [{10.30.0.1 udp 53}]] client/internal/dns/server.go:504: Temporarily deactivating nameservers group due to timeout
2024-05-13T12:47:13+02:00 INFO client/internal/dns/systemd_linux.go:144: adding 1 search domains and 0 match domains. Search list: [netbird.cloud] , Match list: []
2024-05-13T12:47:15+02:00 WARN client/internal/dns/upstream.go:185: probing upstream nameserver 192.168.208.1:53: read udp 10.6.6.2:39320->192.168.208.1:53: i/o timeout
2024-05-13T12:47:16+02:00 INFO client/internal/routemanager/client.go:165: new chosen route is co1dqj3l0ubs739dfnsg with peer hCDjKQBW9TBwsZigTRXxvVzpAYE+ZqDHBol4sOSUMl0= with score 2.966603 for network 192.168.0.0/16
2024-05-13T12:47:16+02:00 INFO client/internal/peer/conn.go:388: connected to peer hCDjKQBW9TBwsZigTRXxvVzpAYE+ZqDHBol4sOSUMl0=, endpoint address: 185.199.30.141:9133
2024-05-13T12:47:17+02:00 INFO client/internal/peer/conn.go:388: connected to peer FfiyZKMquYILabBxOquw/jXEuTjhBq6tUvBEPdV3ckY=, endpoint address: 185.199.30.141:16269
2024-05-13T12:47:17+02:00 WARN client/internal/routemanager/client.go:120: peer 1u25Mrocd2aMv88fUgRnKmM1caynzX+bGTzThCZ3CnE= has 0 latency
2024-05-13T12:47:17+02:00 INFO client/internal/routemanager/client.go:165: new chosen route is co1kv3bl0ubs739dg130 with peer 1u25Mrocd2aMv88fUgRnKmM1caynzX+bGTzThCZ3CnE= with score 2.000000 for network 10.20.0.0/24
2024-05-13T12:47:17+02:00 WARN client/internal/routemanager/client.go:120: peer 1u25Mrocd2aMv88fUgRnKmM1caynzX+bGTzThCZ3CnE= has 0 latency
2024-05-13T12:47:17+02:00 INFO client/internal/routemanager/client.go:165: new chosen route is co1kuj3l0ubs739dg11g with peer 1u25Mrocd2aMv88fUgRnKmM1caynzX+bGTzThCZ3CnE= with score 2.000000 for network 10.30.0.0/24
2024-05-13T12:47:17+02:00 WARN client/internal/routemanager/client.go:120: peer 1u25Mrocd2aMv88fUgRnKmM1caynzX+bGTzThCZ3CnE= has 0 latency
2024-05-13T12:47:17+02:00 WARN client/internal/routemanager/client.go:120: peer 1u25Mrocd2aMv88fUgRnKmM1caynzX+bGTzThCZ3CnE= has 0 latency
2024-05-13T12:47:17+02:00 WARN client/internal/routemanager/client.go:120: peer 1u25Mrocd2aMv88fUgRnKmM1caynzX+bGTzThCZ3CnE= has 0 latency
2024-05-13T12:47:17+02:00 INFO client/internal/peer/conn.go:388: connected to peer 1u25Mrocd2aMv88fUgRnKmM1caynzX+bGTzThCZ3CnE=, endpoint address: 145.236.15.52:51820
2024-05-13T12:47:18+02:00 INFO client/internal/peer/conn.go:388: connected to peer +i/q6dNa3AeF/iNJMH9+CbnsTLmFPfN+/K0KUPJI5wI=, endpoint address: 3.73.3.142:37959
2024-05-13T12:47:21+02:00 INFO client/internal/dns/upstream.go:241: upstreams [192.168.208.1:53] are responsive again. Adding them back to system
2024-05-13T12:47:21+02:00 INFO client/internal/dns/systemd_linux.go:144: adding 3 search domains and 0 match domains. Search list: [int.vidux.hu vidux.internal netbird.cloud] , Match list: []

at 12:47:11: "network 10.30.0.0/24 has not been assigned a routing peer" is vaild but at 12:47:17 while listed:

2024-05-13T12:47:17+02:00 WARN client/internal/routemanager/client.go:120: peer 1u25Mrocd2aMv88fUgRnKmM1caynzX+bGTzThCZ3CnE= has 0 latency
2024-05-13T12:47:17+02:00 INFO client/internal/routemanager/client.go:165: new chosen route is co1kuj3l0ubs739dg11g with peer 1u25Mrocd2aMv88fUgRnKmM1caynzX+bGTzThCZ3CnE= with score 2.000000 for network 10.30.0.0/24

the routing rule never added....

lixmal commented 4 months ago

Hi @lfarkas, please stick to the template.

How did you notice it's not added, what does ip route show table netbird say? Can you provide netbird status -dA?

lfarkas commented 4 months ago

ok it's a fedora 40 (latest) fully updated (of course with nft) netbird status -d show remote as connected (but i already rebooted)

# sudo ip route show table netbird
Error: argument "netbird" is wrong: table id value is invalid

but

# sudo ip route show table all 
192.168.0.0/16 dev wt0 table 7120 
default via 10.6.6.1 dev enp6s0 proto dhcp src 10.6.6.2 metric 100 
10.6.6.0/24 dev enp6s0 proto kernel scope link src 10.6.6.2 metric 100 
100.76.0.0/16 dev wt0 proto kernel scope link src 100.76.24.179 
local 10.6.6.2 dev enp6s0 table local proto kernel scope host src 10.6.6.2 
broadcast 10.6.6.255 dev enp6s0 table local proto kernel scope link src 10.6.6.2 
local 100.76.24.179 dev wt0 table local proto kernel scope host src 100.76.24.179 
broadcast 100.76.255.255 dev wt0 table local proto kernel scope link src 100.76.24.179 
local 127.0.0.0/8 dev lo table local proto kernel scope host src 127.0.0.1 
local 127.0.0.1 dev lo table local proto kernel scope host src 127.0.0.1 
broadcast 127.255.255.255 dev lo table local proto kernel scope link src 127.0.0.1 
fe80::/64 dev vpn0 proto kernel metric 256 linkdown pref medium
fe80::/64 dev enp6s0 proto kernel metric 1024 pref medium
local ::1 dev lo table local proto kernel metric 0 pref medium
local fe80::b62e:99ff:feab:e0d8 dev enp6s0 table local proto kernel metric 0 pref medium
local fe80::f248:5ca1:cd35:3b74 dev vpn0 table local proto kernel metric 0 pref medium
multicast ff00::/8 dev enp6s0 table local proto kernel metric 256 pref medium
multicast ff00::/8 dev vpn0 table local proto kernel metric 256 linkdown pref medium
multicast ff00::/8 dev wt0 table local proto kernel metric 256 pref medium

but after reboot

# sudo ip route show table all 
10.20.0.0/24 dev wt0 table 7120 
10.30.0.0/24 dev wt0 table 7120 
192.168.0.0/16 dev wt0 table 7120 
default via 10.6.6.1 dev enp6s0 proto dhcp src 10.6.6.2 metric 100 
10.6.6.0/24 dev enp6s0 proto kernel scope link src 10.6.6.2 metric 100 
100.76.0.0/16 dev wt0 proto kernel scope link src 100.76.24.179 
local 10.6.6.2 dev enp6s0 table local proto kernel scope host src 10.6.6.2 
broadcast 10.6.6.255 dev enp6s0 table local proto kernel scope link src 10.6.6.2 
local 100.76.24.179 dev wt0 table local proto kernel scope host src 100.76.24.179 
broadcast 100.76.255.255 dev wt0 table local proto kernel scope link src 100.76.24.179 
local 127.0.0.0/8 dev lo table local proto kernel scope host src 127.0.0.1 
local 127.0.0.1 dev lo table local proto kernel scope host src 127.0.0.1 
broadcast 127.255.255.255 dev lo table local proto kernel scope link src 127.0.0.1 
fe80::/64 dev vpn0 proto kernel metric 256 linkdown pref medium
fe80::/64 dev enp6s0 proto kernel metric 1024 pref medium
local ::1 dev lo table local proto kernel metric 0 pref medium
local fe80::b62e:99ff:feab:e0d8 dev enp6s0 table local proto kernel metric 0 pref medium
local fe80::f248:5ca1:cd35:3b74 dev vpn0 table local proto kernel metric 0 pref medium
multicast ff00::/8 dev enp6s0 table local proto kernel metric 256 pref medium
multicast ff00::/8 dev vpn0 table local proto kernel metric 256 linkdown pref medium
multicast ff00::/8 dev wt0 table local proto kernel metric 256 pref medium

after restart remote netbird and reboot m y machine it's start to working... but would be nice what to look into when not working. it's usually goes wrong after netbird version update

lixmal commented 4 months ago

Can you check if the affected route is Selected in

netbird routes list when the issue happens?

oddlama commented 4 months ago

I seem to have exactly the same issue, on a different operating system (NixOS). I'm using client version 0.27.4. I can ping all peers but cannot ping anything in the attached network on the routing peer.

netbird routes list shows:

Available Routes:

  - ID: home
    Network: 192.168.1.0/24
    Status: Selected

ip route shows:

default via 192.168.178.1 dev lan1 proto dhcp src 192.168.178.77 metric 10 # just my normal network, not the routed network!
100.73.0.0/16 dev wt-home proto kernel scope link src 100.73.111.47

There is also no routing table named netbird, but ip route show all shows the desired route. I can't get it to work by rebooting.

oddlama commented 4 months ago

I did some investigating today, and as it turns out I was searching in the wrong place after all. The routes on all clients are set correctly, the table just seems to not have the name netbird. The issue for me was actually in the nftables configuration of the routing peer.

Since netbird already adds its own table to nftables I blindly assumed that this is all that would be needed, but the other forward filter in my firewall (not the one by netbird) of course still dropped the forwarded packets. As @lfarkas configuration started working after a restart my guess would be that this has also been related to some invalid firewall state or network configuration that got reset with a restart.

nazarewk commented 4 months ago

@oddlama you might be hitting https://github.com/netbirdio/netbird/issues/2023 (I noticed you were commenting on my NixOS PR for running multiple Netbird instances).

oddlama commented 4 months ago

Oh true, that could be related! Thanks for the pointer!