netbirdio / netbird

Connect your devices into a secure WireGuard®-based overlay network with SSO, MFA and granular access controls.
https://netbird.io
BSD 3-Clause "New" or "Revised" License
10.64k stars 474 forks source link

FreeBSD: enable skipped routing tests #2386

Open skillcoder opened 1 month ago

skillcoder commented 1 month ago

Describe the problem Currently under FreeBSD some tests are skipped, which could lead to undetected bugs. Specifically tests related to routing disabled with

            t.Skip("skipping ", testCase.name, " on freebsd")

Following tests affected TestAddRemoveRoutes, TestGetNextHop, TestRouting

To Reproduce

To find all affected tests - grep codebase by freebsd substring.

Expected behavior

We should not skip any tests.

skillcoder commented 1 month ago

TestRouting failed because can't cleanup route 10.0.0.0/8

  time="2024-08-06T21:15:16Z" level=info msg="using userspace bind mode"
  time="2024-08-06T21:15:16Z" level=info msg="create tun interface"
  time="2024-08-06T21:15:16Z" level=info msg="assign addr 100.64.0.1 mask 0xffffff00 to utun100 interface"
  time="2024-08-06T21:15:16Z" level=debug msg="adding Wireguard private key"
  time="2024-08-06T21:15:16Z" level=debug msg="Route for 0.0.0.0: interface &{2 16384 lo0  up|loopback|multicast|running} nexthop 192.168.0.1, preferred source 192.168.0.1"
  time="2024-08-06T21:15:16Z" level=debug msg="Failed to get route for ::: no route found for ::"
  time="2024-08-06T21:15:16Z" level=trace msg="route -n add -inet 0.0.0.0/1 -interface utun100: add net 0.0.0.0: gateway utun100\n"
  time="2024-08-06T21:15:16Z" level=trace msg="route -n add -inet 128.0.0.0/1 -interface utun100: add net 128.0.0.0: gateway utun100\n"
  time="2024-08-06T21:15:16Z" level=trace msg="route -n add -inet6 ::/1 -interface utun100: add net ::/1: gateway utun100\n"
  time="2024-08-06T21:15:16Z" level=trace msg="route -n add -inet6 8000::/1 -interface utun100: add net 8000::/1: gateway utun100\n"
  time="2024-08-06T21:15:16Z" level=trace msg="try to add route: 10.0.0.0/8, utun100"
  time="2024-08-06T21:15:16Z" level=trace msg="sleep 1000000000"
  time="2024-08-06T21:15:16Z" level=warning msg="Skipping adding a new route for network 10.0.0.0/8 because it already exists"
  time="2024-08-06T21:15:16Z" level=trace msg="try to add route: 10.10.0.0/24, utun100"
  time="2024-08-06T21:15:16Z" level=debug msg="Route for 0.0.0.0: interface &{2 16384 lo0  up|loopback|multicast|running} nexthop 192.168.0.1, preferred source 192.168.0.1"
  time="2024-08-06T21:15:16Z" level=debug msg="Skipping adding a new route for gateway 192.168.0.1 because it is not in the network 10.10.0.0/24"
  time="2024-08-06T21:15:16Z" level=trace msg="route -n add -inet 10.10.0.0/24 -interface utun100: add net 10.10.0.0: gateway utun100\n"
  time="2024-08-06T21:15:16Z" level=trace msg="try to add route: 127.0.10.0/24, utun100"
  time="2024-08-06T21:15:16Z" level=trace msg="route -n add -inet 127.0.10.0/24 -interface utun100: add net 127.0.10.0: gateway utun100\n"
  time="2024-08-06T21:15:16Z" level=trace msg="try to add route: 172.16.0.0/12, utun100"
  time="2024-08-06T21:15:16Z" level=trace msg="route -n add -inet 172.16.0.0/12 -interface utun100: add net 172.16.0.0: gateway utun100\n"
  time="2024-08-06T21:15:16Z" level=trace msg="route -n delete -inet 172.16.0.0/12 -interface utun100: delete net 172.16.0.0: gateway utun100\n"
  time="2024-08-06T21:15:16Z" level=trace msg="route -n delete -inet 127.0.10.0/24 -interface utun100: delete net 127.0.10.0: gateway utun100\n"
  time="2024-08-06T21:15:16Z" level=trace msg="route -n delete -inet 10.10.0.0/24 -interface utun100: delete net 10.10.0.0: gateway utun100\n"
  time="2024-08-06T21:15:16Z" level=trace msg="route -n delete -inet 10.0.0.0/8 -interface utun100: delete net 10.0.0.0: gateway utun100 fib 0: not in table\n"
  time="2024-08-06T21:15:16Z" level=trace msg="route -n delete -inet 0.0.0.0/1 -interface utun100: delete net 0.0.0.0: gateway utun100\n"
  time="2024-08-06T21:15:16Z" level=trace msg="route -n delete -inet 128.0.0.0/1 -interface utun100: delete net 128.0.0.0: gateway utun100\n"
  time="2024-08-06T21:15:16Z" level=trace msg="route -n delete -inet6 ::/1 -interface utun100: delete net ::/1: gateway utun100\n"
  time="2024-08-06T21:15:16Z" level=trace msg="route -n delete -inet6 8000::/1 -interface utun100: delete net 8000::/1: gateway utun100\n"
  time="2024-08-06T21:15:16Z" level=trace msg="accept unix /var/run/wireguard/utun100.sock: use of closed network connection"
  time="2024-08-06T21:15:16Z" level=error msg="failed to close uapi listener: bad file descriptor"
  --- FAIL: TestRouting (0.06s)
      --- FAIL: TestRouting/To_external_host_without_custom_dialer_via_vpn (0.06s)
          systemops_generic_test.go:368: 
                Error Trace:    /home/runner/work/netbird/netbird/client/internal/routemanager/systemops/systemops_generic_test.go:368
                                            /usr/local/go121/src/testing/testing.go:1169
                                            /usr/local/go121/src/testing/testing.go:1347
                                            /usr/local/go121/src/testing/testing.go:1589
                Error:          Received unexpected error:
                                failed to delete route for 10.0.0.0/8: route cmd retry failed: exit status 1
                Test:           TestRouting/To_external_host_without_custom_dialer_via_vpn
                Messages:       removeVPNRoute should not return err: 10.0.0.0/8 utun100
  FAIL
  FAIL  github.com/netbirdio/netbird/client/internal/routemanager/systemops 2.017s
  FAIL

But it strange, it can't add the route to utun100 since it already somehow appeared on lo0 I've spend 3h and did not find the source of this, probably I should setup remote debugger to be able to run tests under debugger on remote FreeBSD machine. With my current setup it is too time consuming.

skillcoder commented 1 month ago

TestAddRemoveRoutes

  time="2024-08-06T21:26:48Z" level=info msg="using userspace bind mode"
  time="2024-08-06T21:26:48Z" level=info msg="create tun interface"
  time="2024-08-06T21:26:48Z" level=info msg="assign addr 100.65.75.2 mask 0xffffff00 to utun530 interface"
  --- FAIL: TestAddRemoveRoutes (0.02s)
      --- FAIL: TestAddRemoveRoutes/Should_Add_And_Remove_Route_100.66.120.0/24 (0.02s)
          systemops_generic_test.go:83: 
                Error Trace:    /home/runner/work/netbird/netbird/client/internal/routemanager/systemops/systemops_generic_test.go:416
                                            /home/runner/work/netbird/netbird/client/internal/routemanager/systemops/systemops_generic_test.go:83
                Error:          Not equal: 
                                expected: "100.65.75.2"
                                actual  : "invalid IP"

                                Diff:
                                --- Expected
                                +++ Actual
                                @@ -1 +1 @@
                                -100.65.75.2
                                +invalid IP
                Test:           TestAddRemoveRoutes/Should_Add_And_Remove_Route_100.66.120.0/24
                Messages:       route should point to wireguard interface IP
  FAIL
  FAIL  github.com/netbirdio/netbird/client/internal/routemanager/systemops 1.989s
skillcoder commented 1 month ago

Found the reason of failed TestAddRemoveRoutes https://github.com/netbirdio/netbird/blob/bcce1bf18474e68744d81a6b05b8a32e901ae771/client/internal/routemanager/systemops/systemops_generic.go#L381 It was added in https://github.com/skillcoder/netbird/commit/370eecfa908c208f8306fafd7c3800598cb6de75 to fix routing on FreeBSD, without it routing was not working properly, but during refactoring it 2 return parameters replaced with Nexthop{}, I'm not sure it is right.

skillcoder commented 1 month ago

But now without this workaround tests passed.

skillcoder commented 1 month ago

And seems TestGetNextHop also fixed after these changes.

skillcoder commented 1 month ago

Only TestRouting left, and it's still failed with the same logs.

                Error:          Received unexpected error:
                                failed to delete route for 10.0.0.0/8: route cmd retry failed: exit status 1
                Test:           TestRouting/To_external_host_without_custom_dialer_via_vpn
                Messages:       removeVPNRoute should not return err: 10.0.0.0/8 utun100
skillcoder commented 1 month ago

Important! You can't run our netbird test in the environments with subnet 10.0.0.0/8, 192.168.0.0/24, 192.168.1.0/24. Because we use this subnets in test, tests just will failed, so make sure you setup you FreeBSD host on some different subnet, for example 192.168.91.0/24

BTW, here is a settings and commands to debug FreeBSD tests via delve. It seems currently dlv dap does not supported https://github.com/golang/vscode-go/issues/2698

Remote debugging from VS Code:

launch.json

{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "FreeBSD tests",
            "type": "go",
            "debugAdapter": "dlv-dap",
            "mode": "remote",
            "request": "attach",
            "substitutePath": [
                { "from": "${workspaceFolder}", "to": "/usr/home/REMOTEUSER/github/netbirdio/netbird" },
                { "from": "/usr/lib/go", "to": "/usr/local/go121" }
            ],
            "host": "192.168.91.91",
            "port": 12345,
            "showLog": true,
            "hideSystemGoroutines": true
        }
    ]
}

Run on FreeBSD

sudo pkg install delve
cd ~/github/netbirdio/netbird/client/internal/routemanager/systemops
sudo dlv test --accept-multiclient --api-version 2 --headless --listen=192.168.91.91:12345 --log

Related links: https://go.microsoft.com/fwlink/?linkid=830387 https://github.com/golang/vscode-go/wiki/debugging https://github.com/golang/vscode-go/wiki/debugging#connect-to-delve-dap-with-target-specified-at-client-start-up https://forums.freebsd.org/threads/vscode-and-go-on-freebsd.75905/ https://code.visualstudio.com/docs/editor/variables-reference

skillcoder commented 1 month ago

I found a reason of this workaround If remove it netbird will create a following route

192.168.91.223     192.168.91.91      UGHS        em0

And it means route everything to local IP. For reference full route table

Internet:
Destination        Gateway            Flags     Netif Expire
default            192.168.91.223     UGS         em0
127.0.0.1          link#2             UH          lo0
192.168.91.0/24    link#1             U           em0
192.168.91.91      link#1             UHS         lo0
192.168.91.223     192.168.91.91      UGHS        em0

Where 192.168.91.91 is local IP TestAddExistAndRemoveRoute can detect this problem And it disconnect host from any external connection since default route broke.

skillcoder commented 1 month ago

This case broken

        {
            name:           "Should Not Add Route if overlaps with default gateway",
            prefix:         netip.MustParsePrefix(defaultNexthop.IP.String() + "/31"),
            shouldAddRoute: false,
        },

These workaround with returning only interface name without IP fix problem because of this code https://github.com/netbirdio/netbird/blob/af1b42e538e74fe01b3f4398f95b4521d8ac3298/client/internal/routemanager/systemops/systemops_unix.go#L50

skillcoder commented 1 month ago

To be able to properly check TestAddExistAndRemoveRoute shouldAddRoute: false cases I will try to replace unreliable logs-parsing-based test https://github.com/netbirdio/netbird/blob/4bbedb5193f13f1e033acd4657a9c4f41d048bde/client/internal/routemanager/systemops/systemops_generic_test.go#L256 with OpenTelemetry spans and attributes.