netbirdio / netbird

Connect your devices into a secure WireGuard®-based overlay network with SSO, MFA and granular access controls.
https://netbird.io
BSD 3-Clause "New" or "Revised" License
10.81k stars 490 forks source link

Client 0.30.0 Exit Node not working as 0.29.4 #2707

Open rkleivel opened 2 days ago

rkleivel commented 2 days ago

Problem: After upgrading clients to 0.30.0, nodes in a exit node distribution group looses internet connection if exit node is restarted

To Reproduce 1) Create 2 groups is_exit_node and uses_exit_node 2) Add a node in each group (preferably behind different public IPs for easier testing) 3) Create a policy that allows the groups to communicate (unless the Default policy All <-> All is active) image 4) Add an exit node under network routes that makes the node in is_exit_node the Exit Node of the node in uses_exit_node image 5) Run curl ipinfo.io on each node and verify that the public IPs are identical 6) Run netbird down && netbird up on the exit node 7) Wait a minute to allow settings to be updated 8) Run curl ipinfo.io on each node

Expected behavior Each node should still appear to be behind the same public IP.

Are you using NetBird Cloud? Yes

NetBird version 0.30.0 (failing) and 0.29.4 (working)

Additional info Both nodes are running Ubuntu 24.04 server

As this has been fairly easy to reproduce, I do not attach any logs at this stage. Please let me know if they will be necessary, and I'll happily provide :)

mlsmaycon commented 1 day ago

Hello @rkleivel can you please share the output from nft list ruleset from the exit node?

rkleivel commented 1 day ago

Thanks @mlsmaycon! Here is the ruleset after netbird down / up on the exit node:

table ip filter {
    chain INPUT {
        type filter hook input priority filter; policy accept;
    }

    chain OUTPUT {
        type filter hook output priority filter; policy accept;
    }

    chain FORWARD {
        type filter hook forward priority filter; policy accept;
        oifname "wt0" ct state established,related counter packets 0 bytes 0 accept
        iifname "wt0" counter packets 0 bytes 0 accept
    }
}
table ip nat {
    chain POSTROUTING {
        type nat hook postrouting priority srcnat; policy accept;
    }
}
table ip netbird {
    set nb0000001 {
        type ipv4_addr
        flags dynamic
        elements = { 100.93.17.98 }
    }

    set nb0000002 {
        type ipv4_addr
        flags dynamic
        elements = { 100.93.17.98 }
    }

    chain netbird-rt-fwd {
        ct state established,related accept
        counter packets 0 bytes 0 accept
    }

    chain netbird-rt-nat {
        type nat hook postrouting priority srcnat - 1; policy accept;
        iifname "wt0" counter packets 1 bytes 176 masquerade
        oifname "wt0" counter packets 0 bytes 0 masquerade
    }

    chain netbird-acl-input-rules {
        ct state established,related accept
        ip saddr @nb0000001 accept
    }

    chain netbird-acl-output-rules {
        ct state established,related accept
        ip daddr @nb0000002 accept
    }

    chain netbird-acl-input-filter {
        type filter hook input priority filter; policy accept;
        iifname "wt0" jump netbird-acl-input-rules
        iifname "wt0" drop
    }

    chain netbird-acl-output-filter {
        type filter hook output priority filter; policy accept;
        oifname "wt0" ip daddr != 100.93.0.0/16 accept
        oifname "wt0" jump netbird-acl-output-rules
        oifname "wt0" drop
    }

    chain netbird-acl-forward-filter {
        type filter hook forward priority filter; policy accept;
        iifname "wt0" jump netbird-rt-fwd
        iifname "wt0" drop
    }
}

The diff from when it was working looks like this does not seem significant:

diff exit_node_working.txt exit_node_not_working.txt 
12,13c12,13
<       oifname "wt0" ct state established,related counter packets 1048 bytes 2078521 accept
<       iifname "wt0" counter packets 909 bytes 54393 accept
---
>       oifname "wt0" ct state established,related counter packets 0 bytes 0 accept
>       iifname "wt0" counter packets 0 bytes 0 accept
36c36
<       counter packets 25 bytes 1580 accept
---
>       counter packets 0 bytes 0 accept
41c41
<       iifname "wt0" counter packets 18 bytes 1160 masquerade
---
>       iifname "wt0" counter packets 1 bytes 176 masquerade