Open rkleivel opened 1 month ago
Hello @rkleivel can you please share the output from nft list ruleset
from the exit node?
Thanks @mlsmaycon! Here is the ruleset after netbird down / up on the exit node:
table ip filter {
chain INPUT {
type filter hook input priority filter; policy accept;
}
chain OUTPUT {
type filter hook output priority filter; policy accept;
}
chain FORWARD {
type filter hook forward priority filter; policy accept;
oifname "wt0" ct state established,related counter packets 0 bytes 0 accept
iifname "wt0" counter packets 0 bytes 0 accept
}
}
table ip nat {
chain POSTROUTING {
type nat hook postrouting priority srcnat; policy accept;
}
}
table ip netbird {
set nb0000001 {
type ipv4_addr
flags dynamic
elements = { 100.93.17.98 }
}
set nb0000002 {
type ipv4_addr
flags dynamic
elements = { 100.93.17.98 }
}
chain netbird-rt-fwd {
ct state established,related accept
counter packets 0 bytes 0 accept
}
chain netbird-rt-nat {
type nat hook postrouting priority srcnat - 1; policy accept;
iifname "wt0" counter packets 1 bytes 176 masquerade
oifname "wt0" counter packets 0 bytes 0 masquerade
}
chain netbird-acl-input-rules {
ct state established,related accept
ip saddr @nb0000001 accept
}
chain netbird-acl-output-rules {
ct state established,related accept
ip daddr @nb0000002 accept
}
chain netbird-acl-input-filter {
type filter hook input priority filter; policy accept;
iifname "wt0" jump netbird-acl-input-rules
iifname "wt0" drop
}
chain netbird-acl-output-filter {
type filter hook output priority filter; policy accept;
oifname "wt0" ip daddr != 100.93.0.0/16 accept
oifname "wt0" jump netbird-acl-output-rules
oifname "wt0" drop
}
chain netbird-acl-forward-filter {
type filter hook forward priority filter; policy accept;
iifname "wt0" jump netbird-rt-fwd
iifname "wt0" drop
}
}
The diff from when it was working looks like this does not seem significant:
diff exit_node_working.txt exit_node_not_working.txt
12,13c12,13
< oifname "wt0" ct state established,related counter packets 1048 bytes 2078521 accept
< iifname "wt0" counter packets 909 bytes 54393 accept
---
> oifname "wt0" ct state established,related counter packets 0 bytes 0 accept
> iifname "wt0" counter packets 0 bytes 0 accept
36c36
< counter packets 25 bytes 1580 accept
---
> counter packets 0 bytes 0 accept
41c41
< iifname "wt0" counter packets 18 bytes 1160 masquerade
---
> iifname "wt0" counter packets 1 bytes 176 masquerade
hi there; can you try our latest release v0.30.1
please?
hi there; can you try our latest release
v0.30.1
please?
Sure! Unfortunately I cannot see any improvement since 0.30.0
Hello @rkleivel can you please run the following commands?
On exit node:
sysctl net.ipv4.ip_forward
sudo tcpdump -i any -nn host 1.1.1.1 and port 443 # keep this running while testing on client
On client:
ip route get 1.1.1.1
nc -vw 5 -z 1.1.1.1 443
Then, share the output with us.
Hi @mlsmaycon,
As opposed to my comment Oct 11 at 10.11 GMT, I am currently not able to reproduce the issue. Below I will provide the output from your commands.
However, High Availability with 2 exit nodes still does not seem to work. I did not mention that earlier because I did not get time to test it thoroughly, but noticed it last week while debugging the issue of this thread, and have a feeling it might be related. I will provide similar outputs in another comment.
Here the logging for only one exit node that goes down and up, showing that the it now works as expected. (I did add some timestamps to make it easier to relate the two). Both nodes on 0.30.1:
Exit node:
admin@exit1:~$ date && sudo tcpdump -i any -nn host 1.1.1.1 and port 443
Mon Oct 14 07:46:36 UTC 2024
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
07:46:48.250437 wt0 In IP 100.93.158.105.32836 > 1.1.1.1.443: Flags [S], seq 1080105978, win 64480, options [mss 1240,sackOK,TS val 4127900581 ecr 0,nop,wscale 7], length 0
07:46:48.250465 ens18 Out IP 192.168.7.26.32836 > 1.1.1.1.443: Flags [S], seq 1080105978, win 64480, options [mss 1240,sackOK,TS val 4127900581 ecr 0,nop,wscale 7], length 0
07:46:48.259774 ens18 In IP 1.1.1.1.443 > 192.168.7.26.32836: Flags [S.], seq 3227839981, ack 1080105979, win 65535, options [mss 1460,sackOK,TS val 2170826905 ecr 4127900581,nop,wscale 13], length 0
07:46:48.259793 wt0 Out IP 1.1.1.1.443 > 100.93.158.105.32836: Flags [S.], seq 3227839981, ack 1080105979, win 65535, options [mss 1460,sackOK,TS val 2170826905 ecr 4127900581,nop,wscale 13], length 0
07:46:48.321426 wt0 In IP 100.93.158.105.32836 > 1.1.1.1.443: Flags [.], ack 1, win 504, options [nop,nop,TS val 4127900651 ecr 2170826905], length 0
07:46:48.321442 ens18 Out IP 192.168.7.26.32836 > 1.1.1.1.443: Flags [.], ack 1, win 504, options [nop,nop,TS val 4127900651 ecr 2170826905], length 0
07:46:48.322012 wt0 In IP 100.93.158.105.32836 > 1.1.1.1.443: Flags [F.], seq 1, ack 1, win 504, options [nop,nop,TS val 4127900651 ecr 2170826905], length 0
07:46:48.322025 ens18 Out IP 192.168.7.26.32836 > 1.1.1.1.443: Flags [F.], seq 1, ack 1, win 504, options [nop,nop,TS val 4127900651 ecr 2170826905], length 0
07:46:48.331567 ens18 In IP 1.1.1.1.443 > 192.168.7.26.32836: Flags [F.], seq 1, ack 2, win 8, options [nop,nop,TS val 2170826977 ecr 4127900651], length 0
07:46:48.331598 wt0 Out IP 1.1.1.1.443 > 100.93.158.105.32836: Flags [F.], seq 1, ack 2, win 8, options [nop,nop,TS val 2170826977 ecr 4127900651], length 0
07:46:48.377526 wt0 In IP 100.93.158.105.32836 > 1.1.1.1.443: Flags [.], ack 2, win 504, options [nop,nop,TS val 4127900708 ecr 2170826977], length 0
07:46:48.377546 ens18 Out IP 192.168.7.26.32836 > 1.1.1.1.443: Flags [.], ack 2, win 504, options [nop,nop,TS val 4127900708 ecr 2170826977], length 0
^C
12 packets captured
14 packets received by filter
0 packets dropped by kernel
admin@exit1:~$ netbird down && netbird up
Disconnected
Connected
admin@exit1:~$ date && sudo tcpdump -i any -nn host 1.1.1.1 and port 443
Mon Oct 14 07:47:47 UTC 2024
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
07:47:51.625745 wt0 In IP 100.93.158.105.56110 > 1.1.1.1.443: Flags [S], seq 2392515285, win 64480, options [mss 1240,sackOK,TS val 4127963956 ecr 0,nop,wscale 7], length 0
07:47:51.625771 ens18 Out IP 192.168.7.26.56110 > 1.1.1.1.443: Flags [S], seq 2392515285, win 64480, options [mss 1240,sackOK,TS val 4127963956 ecr 0,nop,wscale 7], length 0
07:47:51.635630 ens18 In IP 1.1.1.1.443 > 192.168.7.26.56110: Flags [S.], seq 1033369258, ack 2392515286, win 65535, options [mss 1460,sackOK,TS val 3968834669 ecr 4127963956,nop,wscale 13], length 0
07:47:51.635670 wt0 Out IP 1.1.1.1.443 > 100.93.158.105.56110: Flags [S.], seq 1033369258, ack 2392515286, win 65535, options [mss 1460,sackOK,TS val 3968834669 ecr 4127963956,nop,wscale 13], length 0
07:47:51.680578 wt0 In IP 100.93.158.105.56110 > 1.1.1.1.443: Flags [.], ack 1, win 504, options [nop,nop,TS val 4127964011 ecr 3968834669], length 0
07:47:51.680599 ens18 Out IP 192.168.7.26.56110 > 1.1.1.1.443: Flags [.], ack 1, win 504, options [nop,nop,TS val 4127964011 ecr 3968834669], length 0
07:47:51.680832 wt0 In IP 100.93.158.105.56110 > 1.1.1.1.443: Flags [F.], seq 1, ack 1, win 504, options [nop,nop,TS val 4127964011 ecr 3968834669], length 0
07:47:51.680852 ens18 Out IP 192.168.7.26.56110 > 1.1.1.1.443: Flags [F.], seq 1, ack 1, win 504, options [nop,nop,TS val 4127964011 ecr 3968834669], length 0
07:47:51.691850 ens18 In IP 1.1.1.1.443 > 192.168.7.26.56110: Flags [F.], seq 1, ack 2, win 8, options [nop,nop,TS val 3968834725 ecr 4127964011], length 0
07:47:51.691877 wt0 Out IP 1.1.1.1.443 > 100.93.158.105.56110: Flags [F.], seq 1, ack 2, win 8, options [nop,nop,TS val 3968834725 ecr 4127964011], length 0
07:47:51.738217 wt0 In IP 100.93.158.105.56110 > 1.1.1.1.443: Flags [.], ack 2, win 504, options [nop,nop,TS val 4127964069 ecr 3968834725], length 0
07:47:51.738254 ens18 Out IP 192.168.7.26.56110 > 1.1.1.1.443: Flags [.], ack 2, win 504, options [nop,nop,TS val 4127964069 ecr 3968834725], length 0
^C
12 packets captured
13 packets received by filter
0 packets dropped by kernel
admin@exit1:~$ sysctl net.ipv4.ip_forward
net.ipv4.ip_forward = 1
admin@exit1:~$
Client:
admin@client:~$ date && ip route get 1.1.1.1
Mon Oct 14 07:46:38 UTC 2024
1.1.1.1 dev wt0 table netbird src 100.93.158.105 uid 1000
cache
admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443
Mon Oct 14 07:46:48 UTC 2024
Connection to 1.1.1.1 443 port [tcp/https] succeeded!
admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443
Mon Oct 14 07:47:07 UTC 2024
nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress
admin@client:~$ date && ip route get 1.1.1.1
Mon Oct 14 07:47:13 UTC 2024
1.1.1.1 dev wt0 table netbird src 100.93.158.105 uid 1000
cache
admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443
Mon Oct 14 07:47:16 UTC 2024
nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress
admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443
Mon Oct 14 07:47:22 UTC 2024
nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress
admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443
Mon Oct 14 07:47:29 UTC 2024
nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress
admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443
Mon Oct 14 07:47:35 UTC 2024
nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress
admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443
Mon Oct 14 07:47:42 UTC 2024
nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress
admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443
Mon Oct 14 07:47:51 UTC 2024
Connection to 1.1.1.1 443 port [tcp/https] succeeded!
admin@client:~$
Thanks, @rkleivel, for sharing the outputs.
Ok, to confirm what we see in with the timestamps, the failure was after you restarted the connection and is probably related to the time it took for the peers to connect. Right?
With the previous release, we fixed an issue with forwarding rules caused by the number of peers in an access control rule, which shouldn't affect nodes with exit nodes and no access control groups set in any of the routing peer routes. So it may not have affected you unless you had an access control group for a network route.
We will wait for your check with HA as well.
As promised, here follows the outputs for 1 client with 2 exit nodes. Initially both exit nodes are up. I confirm that routing through exit node 1 is OK, then I take exit node 1 down. Routing does not switch to exit node 2 until I manually deactivate and activate the exit node entry in the web GUI.
Client:
admin@client:~$ date && ip route get 1.1.1.1
Mon Oct 14 07:58:39 UTC 2024
1.1.1.1 dev wt0 table netbird src 100.93.158.105 uid 1000
cache
admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443
Mon Oct 14 07:59:10 UTC 2024
nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress
admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443
Mon Oct 14 07:59:19 UTC 2024
nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress
admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443
Mon Oct 14 07:59:35 UTC 2024
Connection to 1.1.1.1 443 port [tcp/https] succeeded!
admin@client:~$
admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443
Mon Oct 14 08:00:11 UTC 2024
nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress
admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443
Mon Oct 14 08:00:27 UTC 2024
nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress
admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443
Mon Oct 14 08:00:44 UTC 2024
nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress
admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443
Mon Oct 14 08:01:10 UTC 2024
nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress
admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443
Mon Oct 14 08:01:41 UTC 2024
nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress
admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443
Mon Oct 14 08:03:04 UTC 2024
nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress
admin@client:~$ date && ip route get 1.1.1.1
Mon Oct 14 08:04:25 UTC 2024
1.1.1.1 dev wt0 table netbird src 100.93.158.105 uid 1000
cache
admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443
Mon Oct 14 08:04:42 UTC 2024
nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress
admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443
Mon Oct 14 08:04:56 UTC 2024
nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress
admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443
Mon Oct 14 08:05:12 UTC 2024
nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress
admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443
Mon Oct 14 08:05:35 UTC 2024
nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress
admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443
Mon Oct 14 08:06:00 UTC 2024
nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress
admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443
Mon Oct 14 08:06:26 UTC 2024
nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress
admin@client:~$
admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443
Mon Oct 14 08:06:57 UTC 2024
nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress
admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443
Mon Oct 14 08:07:32 UTC 2024
nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress
admin@client:~$ date && ip route get 1.1.1.1
Mon Oct 14 08:08:28 UTC 2024
1.1.1.1 dev wt0 table netbird src 100.93.158.105 uid 1000
cache
admin@client:~$
EXIT Node deactivated and activated in GUI at this point
admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443
Mon Oct 14 08:08:30 UTC 2024
Connection to 1.1.1.1 443 port [tcp/https] succeeded!
admin@client:~$ date
Mon Oct 14 08:08:54 UTC 2024
admin@client:~$
Exit Node 1:
admin@exit1:~$ date && sysctl net.ipv4.ip_forward
Mon Oct 14 07:58:46 UTC 2024
net.ipv4.ip_forward = 1
admin@exit1:~$ date && sudo tcpdump -i any -nn host 1.1.1.1 and port 443
Mon Oct 14 07:59:01 UTC 2024
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
07:59:39.508343 wt0 In IP 100.93.158.105.47854 > 1.1.1.1.443: Flags [S], seq 2382693363, win 64480, options [mss 1240,sackOK,TS val 4128671838 ecr 0,nop,wscale 7], length 0
07:59:39.508381 ens18 Out IP 192.168.7.26.47854 > 1.1.1.1.443: Flags [S], seq 2382693363, win 64480, options [mss 1240,sackOK,TS val 4128671838 ecr 0,nop,wscale 7], length 0
07:59:39.517940 ens18 In IP 1.1.1.1.443 > 192.168.7.26.47854: Flags [S.], seq 2277416614, ack 2382693364, win 65535, options [mss 1460,sackOK,TS val 1861873 ecr 4128671838,nop,wscale 13], length 0
07:59:39.517982 wt0 Out IP 1.1.1.1.443 > 100.93.158.105.47854: Flags [S.], seq 2277416614, ack 2382693364, win 65535, options [mss 1460,sackOK,TS val 1861873 ecr 4128671838,nop,wscale 13], length 0
07:59:39.563320 wt0 In IP 100.93.158.105.47854 > 1.1.1.1.443: Flags [.], ack 1, win 504, options [nop,nop,TS val 4128671893 ecr 1861873], length 0
07:59:39.563339 ens18 Out IP 192.168.7.26.47854 > 1.1.1.1.443: Flags [.], ack 1, win 504, options [nop,nop,TS val 4128671893 ecr 1861873], length 0
07:59:39.564156 wt0 In IP 100.93.158.105.47854 > 1.1.1.1.443: Flags [F.], seq 1, ack 1, win 504, options [nop,nop,TS val 4128671893 ecr 1861873], length 0
07:59:39.564175 ens18 Out IP 192.168.7.26.47854 > 1.1.1.1.443: Flags [F.], seq 1, ack 1, win 504, options [nop,nop,TS val 4128671893 ecr 1861873], length 0
07:59:39.573752 ens18 In IP 1.1.1.1.443 > 192.168.7.26.47854: Flags [F.], seq 1, ack 2, win 8, options [nop,nop,TS val 1861929 ecr 4128671893], length 0
07:59:39.573792 wt0 Out IP 1.1.1.1.443 > 100.93.158.105.47854: Flags [F.], seq 1, ack 2, win 8, options [nop,nop,TS val 1861929 ecr 4128671893], length 0
07:59:39.618865 wt0 In IP 100.93.158.105.47854 > 1.1.1.1.443: Flags [.], ack 2, win 504, options [nop,nop,TS val 4128671948 ecr 1861929], length 0
07:59:39.618890 ens18 Out IP 192.168.7.26.47854 > 1.1.1.1.443: Flags [.], ack 2, win 504, options [nop,nop,TS val 4128671948 ecr 1861929], length 0
^C
12 packets captured
14 packets received by filter
0 packets dropped by kernel
admin@exit1:~$ date && netbird down
Mon Oct 14 08:00:03 UTC 2024
Disconnected
admin@exit1:~$ date && sudo tcpdump -i any -nn host 1.1.1.1 and port 443
Mon Oct 14 08:04:40 UTC 2024
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
^C
0 packets captured
2 packets received by filter
0 packets dropped by kernel
admin@exit1:~$ date
Mon Oct 14 08:08:58 UTC 2024
admin@exit1:~$
Exit Node 2:
admin@exit2:~$ date && sysctl net.ipv4.ip_forward
Mon Oct 14 07:58:49 UTC 2024
net.ipv4.ip_forward = 1
admin@exit2:~$ date && sudo tcpdump -i any -nn host 1.1.1.1 and port 443
Mon Oct 14 07:59:02 UTC 2024
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
08:08:30.973700 wt0 In IP 100.93.158.105.35580 > 1.1.1.1.443: Flags [S], seq 1509625333, win 64480, options [mss 1240,sackOK,TS val 4129203321 ecr 0,nop,wscale 7], length 0
08:08:30.973717 eth0 Out IP 172.17.0.4.35580 > 1.1.1.1.443: Flags [S], seq 1509625333, win 64480, options [mss 1240,sackOK,TS val 4129203321 ecr 0,nop,wscale 7], length 0
08:08:30.985416 eth0 In IP 1.1.1.1.443 > 172.17.0.4.35580: Flags [S.], seq 1235011614, ack 1509625334, win 65535, options [mss 1460,sackOK,TS val 560577716 ecr 4129203321,nop,wscale 13], length 0
08:08:30.985430 wt0 Out IP 1.1.1.1.443 > 100.93.158.105.35580: Flags [S.], seq 1235011614, ack 1509625334, win 65535, options [mss 1460,sackOK,TS val 560577716 ecr 4129203321,nop,wscale 13], length 0
08:08:30.996615 wt0 In IP 100.93.158.105.35580 > 1.1.1.1.443: Flags [.], ack 1, win 504, options [nop,nop,TS val 4129203343 ecr 560577716], length 0
08:08:30.996627 eth0 Out IP 172.17.0.4.35580 > 1.1.1.1.443: Flags [.], ack 1, win 504, options [nop,nop,TS val 4129203343 ecr 560577716], length 0
08:08:30.996632 wt0 In IP 100.93.158.105.35580 > 1.1.1.1.443: Flags [F.], seq 1, ack 1, win 504, options [nop,nop,TS val 4129203344 ecr 560577716], length 0
08:08:30.996636 eth0 Out IP 172.17.0.4.35580 > 1.1.1.1.443: Flags [F.], seq 1, ack 1, win 504, options [nop,nop,TS val 4129203344 ecr 560577716], length 0
08:08:31.009041 eth0 In IP 1.1.1.1.443 > 172.17.0.4.35580: Flags [.], ack 2, win 8, options [nop,nop,TS val 560577740 ecr 4129203344], length 0
08:08:31.009041 eth0 In IP 1.1.1.1.443 > 172.17.0.4.35580: Flags [F.], seq 1, ack 2, win 8, options [nop,nop,TS val 560577740 ecr 4129203344], length 0
08:08:31.009062 wt0 Out IP 1.1.1.1.443 > 100.93.158.105.35580: Flags [.], ack 2, win 8, options [nop,nop,TS val 560577740 ecr 4129203344], length 0
08:08:31.009081 wt0 Out IP 1.1.1.1.443 > 100.93.158.105.35580: Flags [F.], seq 1, ack 2, win 8, options [nop,nop,TS val 560577740 ecr 4129203344], length 0
08:08:31.030793 wt0 In IP 100.93.158.105.35580 > 1.1.1.1.443: Flags [.], ack 2, win 504, options [nop,nop,TS val 4129203377 ecr 560577740], length 0
08:08:31.030800 eth0 Out IP 172.17.0.4.35580 > 1.1.1.1.443: Flags [.], ack 2, win 504, options [nop,nop,TS val 4129203377 ecr 560577740], length 0
^C
14 packets captured
16 packets received by filter
0 packets dropped by kernel
admin@exit2:~$ date
Mon Oct 14 08:08:50 UTC 2024
admin@exit2:~$
Thanks, @rkleivel, for sharing the outputs.
Ok, to confirm what we see in with the timestamps, the failure was after you restarted the connection and is probably related to the time it took for the peers to connect. Right?
With the previous release, we fixed an issue with forwarding rules caused by the number of peers in an access control rule, which shouldn't affect nodes with exit nodes and no access control groups set in any of the routing peer routes. So it may not have affected you unless you had an access control group for a network route.
We will wait for your check with HA as well.
I can confirm that the failure was after netbird down && netbird up
on the exit node. Now it just takes up to a couple of minutes till the connection is restored. Last week, when creating this issue, I could easily wait for half an hour and still connection was not restored.
I can also confirm that Access Control Groups (optioinal)
in the web GUI Exit node definition is empty during my tests.
hi there, we have released 0.31.1, that addresses some of the issues described here; can you please test it with this version?
My apology for late response on this. I have tested the initial scenario, as well as the high availability aspect on 0.33.0, and all now seems to work as expected. So thanks a lot for that!
I do, however, see some strange effects on docker containers that run inside a netbird node that uses_exit_node
(see test system setup in my initial post):
the request times out. This does not happen if the exit node in use is hosted elsewhere, or if I do not use an exit node. If I run the same curl https://... on the docker host, it is also always fine no matter where the exit node is hosted. Http endpoints are always OK. All nodes involved in the test have the same setup of Ubuntu 24.04, with netbird 0.33.0
I do not necessarily expect this to be a netbird issue, but would be very thankful for any thoughts on where Netbird possibly could intersect with a VM in Azure causing such an effect.
Problem: After upgrading clients to 0.30.0, nodes in a exit node distribution group looses internet connection if exit node is restarted
To Reproduce 1) Create 2 groups
is_exit_node
anduses_exit_node
2) Add a node in each group (preferably behind different public IPs for easier testing) 3) Create a policy that allows the groups to communicate (unless the Default policyAll <-> All
is active) 4) Add an exit node under network routes that makes the node inis_exit_node
the Exit Node of the node inuses_exit_node
5) Runcurl ipinfo.io
on each node and verify that the public IPs are identical 6) Runnetbird down && netbird up
on the exit node 7) Wait a minute to allow settings to be updated 8) Runcurl ipinfo.io
on each nodeExpected behavior Each node should still appear to be behind the same public IP.
uses_exit_node
does not regain internet access.Are you using NetBird Cloud? Yes
NetBird version 0.30.0 (failing) and 0.29.4 (working)
Additional info Both nodes are running Ubuntu 24.04 server
As this has been fairly easy to reproduce, I do not attach any logs at this stage. Please let me know if they will be necessary, and I'll happily provide :)