netbirdio / netbird

Connect your devices into a secure WireGuard®-based overlay network with SSO, MFA and granular access controls.
https://netbird.io
BSD 3-Clause "New" or "Revised" License
10.77k stars 486 forks source link

`ip route` conflicts and race conditions when resolving best routes in `clientNetwork.getBestRouteFromStatuses()` #2066

Open nazarewk opened 4 months ago

nazarewk commented 4 months ago

Describe the problem

I have noticed the client-side route selection feature so during today's Netbird workshop for our team I have enabled both routes clashing on 10.4/16 and later today teammates started reporting problems with accessing the primary system so I switched the secondary off and dug into the code.

First thing I have noticed using ip route metric was superseded by:

The method is scoped to *clientNetwork, which I guess does not consider a possibility of another NetworkID providing the same CIDR (10.4/16 in our case) resulting in a conflict/race condition on the created ip route entry.

To Reproduce

Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior

Either of:

Are you using NetBird Cloud?

Yes

NetBird version

0.27.7

NetBird status -d output:

If applicable, add the `netbird status -d' command output.

Screenshots

If applicable, add screenshots to help explain your problem.

Additional context

Add any other context about the problem here.

nazarewk commented 4 months ago

I've noticed that I was connected to the primary network and the ip route entry disappeared completely after i disabled the secondary network in the web ui.

I had to do netbird down && netbird up for the client to re-register the primary route.

nazarewk commented 4 months ago

a single clientNetwork seems to represent a very specific <network-id>-<network-range> grouping:

  1. https://github.com/netbirdio/netbird/blob/f176807ebee751289382202569bf6874105e111f/client/internal/routemanager/manager.go#L46-L46
  2. https://github.com/netbirdio/netbird/blob/f176807ebee751289382202569bf6874105e111f/route/hauniqueid.go#L8-L10

making it impossible to know about other input.NetID providing the same input.Network.String() on completely different site/datacenter

nazarewk commented 4 months ago

theoretically using input.Network.String() instead of HAUniqueID here could help with my issue: https://github.com/netbirdio/netbird/blob/f176807ebee751289382202569bf6874105e111f/client/internal/routemanager/manager.go#L243-L251 but I am not sure about the wider consequences

nazarewk commented 4 months ago

I've noticed that I was connected to the primary network and the ip route entry disappeared completely after i disabled the secondary network in the web ui.

I had to do netbird down && netbird up for the client to re-register the primary route.

@mlsmaycon noted that this part of the issue (removing routes still in use) will be fixed by https://github.com/netbirdio/netbird/pull/1943 , since it is a big refactor of (previously) linked code in this issue it will be worth analyzing and revisiting again after #1943 is merged