netbirdio / netbird

Connect your devices into a secure WireGuard®-based overlay network with SSO, MFA and granular access controls.
https://netbird.io
BSD 3-Clause "New" or "Revised" License
11.25k stars 517 forks source link

Route with HA routing peers group broken with relay functionality #2870

Closed saule1508 closed 1 week ago

saule1508 commented 1 week ago

Describe the problem

When I use relay, the routes on the client are continuously flipping between the routing peers and it is not working. Without relay it is working. In the log I see that kind of messages for each routes.

2024-11-08T22:26:17Z INFO client/internal/routemanager/client.go:171: New chosen route is csc820pb0kn6h96umhvg:cqor8o1b0kn140rrppog with peer lDMoPxKvgzTA7uBP7+smggRoqhseYoSK0LKbq4gTQyA= with score 0.000000 for network [10.XXX.167.0/24] 2024-11-08T22:26:17Z INFO client/internal/routemanager/client.go:171: New chosen route is csc820pb0kn6h96umhvg:cqor8r1b0kn140rrppp0 with peer 3c185SdFwbQW0VvZkyEUZCF8D2QKg+inBnqP3T2ObSA= with score 0.000000 for network [10.XXX.167.0/24] 2024-11-08T22:26:17Z INFO client/internal/routemanager/client.go:171: New chosen route is csc820pb0kn6h96umhvg:cqor8o1b0kn140rrppog with peer lDMoPxKvgzTA7uBP7+smggRoqhseYoSK0LKbq4gTQyA= with score 0.000000 for network [10.XXX.167.0/24] 2024-11-08T22:26:17Z INFO client/internal/routemanager/client.go:171: New chosen route is csc820pb0kn6h96umhvg:cqor8r1b0kn140rrppp0 with peer 3c185SdFwbQW0VvZkyEUZCF8D2QKg+inBnqP3T2ObSA= with score 0.000000 for network [10.XXX.167.0/24]

To Reproduce

  1. Install 0.30.3 version of netbird on the server, enable the relay functionality
  2. Install 0.30.3 version of netbird on routing peers (rhel 9): two routing peers
  3. install 0.31.0 version of netbird on client peer (fedora 41 or rocky linux 9) but also issue with 0.29.1
  4. on the client.log I see that the route are continuously flapping between the two routing peers

As this is similar to this issue I tried with client 0.29.1 but same issue. https://github.com/netbirdio/netbird/issues/2575

When I don't use relay it is working, the route have a score (and a latency) and they are not flapping

Expected behavior

The routes should be available on the client and not being removed/added all the time

Are you using NetBird Cloud?

self-hosted

NetBird version

0.30.3 server and 0.31 client

NetBird status -dA output:

in the output, it shows the peer has only two routes but there should be a lot more

`text [root@testpeerclient ~]# netbird status -dA Peers detail: nbrpeer0101a3.anon-qkAva.domain: NetBird IP: 100.74.183.10 Public key: 3c185SdFwbQW0VvZkyEUZCF8D2QKg+inBnqP3T2ObSA= Status: Connected -- detail -- Connection type: Relayed ICE candidate (Local/Remote): -/- ICE candidate endpoints (Local/Remote): -/- Relay server address: rels://netbird.offnet.sandbox.apac.anon-1jqeA.domain:443/relay Last connection update: 4 seconds ago Last WireGuard handshake: 4 seconds ago Transfer status (received/sent) 92 B/180 B Quantum resistance: false Routes: - Latency: 0s

nbrpeer0101a1.anon-qkAva.domain: NetBird IP: 100.74.199.179 Public key: lDMoPxKvgzTA7uBP7+smggRoqhseYoSK0LKbq4gTQyA= Status: Connected -- detail -- Connection type: Relayed ICE candidate (Local/Remote): -/- ICE candidate endpoints (Local/Remote): -/- Relay server address: rels://netbird.XXXX.anon-1jqeA.domain:443/relay Last connection update: 5 seconds ago Last WireGuard handshake: 5 seconds ago Transfer status (received/sent) 92 B/276 B Quantum resistance: false Routes: 10.146.253.0/24, 10.146.39.0/24 Latency: 0s

OS: linux/amd64 Daemon version: 0.31.0 CLI version: 0.31.0 Management: Connected to https://netbird.XXXX.anon-1jqeA.domain:443 Signal: Connected to https://netbird.XXXX.anon-1jqeA.domain:443 Relays: [stun:stun.l.anon-ndGdW.domain:19302] is Available [turns:coturn.XXXX.apac.anon-1jqeA.domain:443?transport=tcp] is Available [rels://netbird.XXXX.anon-1jqeA.domain:443/relay] is Available Nameservers: [10.XXX.35.147:53] for [.] is Available [10.XXX.35.84:53] for [.] is Available [10.XXX.34.214:53] for [.] is Unavailable, reason: 1 error occurred:

Yes, client is linux I will send the debug log if usefull, but in the client.log this is what I see for one of the route

` 2024-11-08T22:26:17Z INFO client/internal/routemanager/client.go:171: New chosen route is csc820pb0kn6h96umhvg:cqor8o1b0kn140rrppog with peer lDMoPxKvgzTA7uBP7+smggRoqhseYoSK0LKbq4gTQyA= with score 0.000000 for network [10.XXX.167.0/24] 2024-11-08T22:26:17Z INFO client/internal/routemanager/client.go:171: New chosen route is csc820pb0kn6h96umhvg:cqor8r1b0kn140rrppp0 with peer 3c185SdFwbQW0VvZkyEUZCF8D2QKg+inBnqP3T2ObSA= with score 0.000000 for network [10.XXX.167.0/24] 2024-11-08T22:26:17Z INFO client/internal/routemanager/client.go:171: New chosen route is csc820pb0kn6h96umhvg:cqor8o1b0kn140rrppog with peer lDMoPxKvgzTA7uBP7+smggRoqhseYoSK0LKbq4gTQyA= with score 0.000000 for network [10.XXX.167.0/24] 2024-11-08T22:26:17Z INFO client/internal/routemanager/client.go:171: New chosen route is csc820pb0kn6h96umhvg:cqor8r1b0kn140rrppp0 with peer 3c185SdFwbQW0VvZkyEUZCF8D2QKg+inBnqP3T2ObSA= with score 0.000000 for network [10.XXX.167.0/24] 2024-11-08T22:26:17Z INFO client/internal/routemanager/client.go:171: New chosen route is csc820pb0kn6h96umhvg:cqor8o1b0kn140rrppog with peer lDMoPxKvgzTA7uBP7+smggRoqhseYoSK0LKbq4gTQyA= with score 0.000000 for network [10.XXX.167.0/24] 2024-11-08T22:26:17Z INFO client/internal/routemanager/client.go:171: New chosen route is csc820pb0kn6h96umhvg:cqor8r1b0kn140rrppp0 with peer 3c185SdFwbQW0VvZkyEUZCF8D2QKg+inBnqP3T2ObSA= with score 0.000000 for network [10.XXX.167.0/24] 2024-11-08T22:26:17Z INFO client/internal/routemanager/client.go:171: New chosen route is csc820pb0kn6h96umhvg:cqor8o1b0kn140rrppog with peer lDMoPxKvgzTA7uBP7+smggRoqhseYoSK0LKbq4gTQyA= with score 0.000000 for network [10.XXX.167.0/24] 2024-11-08T22:26:17Z INFO client/internal/routemanager/client.go:171: New chosen route is csc820pb0kn6h96umhvg:cqor8r1b0kn140rrppp0 with peer 3c185SdFwbQW0VvZkyEUZCF8D2QKg+inBnqP3T2ObSA= with score 0.000000 for network [10.XXX.167.0/24] 2024-11-08T22:26:17Z INFO client/internal/routemanager/client.go:171: New chosen route is csc820pb0kn6h96umhvg:cqor8o1b0kn140rrppog with peer lDMoPxKvgzTA7uBP7+smggRoqhseYoSK0LKbq4gTQyA= with score 0.000000 for network [10.XXX.167.0/24] 2024-11-08T22:26:17Z INFO client/internal/routemanager/client.go:171: New chosen route is csc820pb0kn6h96umhvg:cqor8r1b0kn140rrppp0 with peer 3c185SdFwbQW0VvZkyEUZCF8D2QKg+inBnqP3T2ObSA= with score 0.000000 for network [10.XXX.167.0/24] 2024-11-08T22:26:17Z INFO client/internal/routemanager/client.go:171: New chosen route is csc820pb0kn6h96umhvg:cqor8o1b0kn140rrppog with peer lDMoPxKvgzTA7uBP7+smggRoqhseYoSK0LKbq4gTQyA= with score 0.000000 for network [10.XXX.167.0/24] 2024-11-08T22:26:17Z INFO client/internal/routemanager/client.go:171: New chosen route is csc820pb0kn6h96umhvg:cqor8r1b0kn140rrppp0 with peer 3c185SdFwbQW0VvZkyEUZCF8D2QKg+inBnqP3T2ObSA= with score 0.000000 for network [10.XXX.167.0/24] 2024-11-08T22:26:17Z INFO client/internal/routemanager/client.go:171: New chosen route is csc820pb0kn6h96umhvg:cqor8o1b0kn140rrppog with peer lDMoPxKvgzTA7uBP7+smggRoqhseYoSK0LKbq4gTQyA= with score 0.000000 for network [10.XXX.167.0/24] 2024-11-08T22:26:17Z INFO client/internal/routemanager/client.go:171: New chosen route is csc820pb0kn6h96umhvg:cqor8r1b0kn140rrppp0 with peer 3c185SdFwbQW0VvZkyEUZCF8D2QKg+inBnqP3T2ObSA= with score 0.000000 for network [10.XXX.167.0/24] 2024-11-08T22:26:17Z INFO client/internal/routemanager/client.go:171: New chosen route is csc820pb0kn6h96umhvg:cqor8o1b0kn140rrppog with peer lDMoPxKvgzTA7uBP7+smggRoqhseYoSK0LKbq4gTQyA= with score 0.000000 for network [10.XXX.167.0/24] 2024-11-08T22:26:17Z INFO client/internal/routemanager/client.go:171: New chosen route is csc820pb0kn6h96umhvg:cqor8r1b0kn140rrppp0 with peer 3c185SdFwbQW0VvZkyEUZCF8D2QKg+inBnqP3T2ObSA= with score 0.000000 for network [10.XXX.167.0/24]

`

Screenshots

If applicable, add screenshots to help explain your problem.

Additional context

Add any other context about the problem here.

saule1508 commented 1 week ago

Thanks to this change https://github.com/netbirdio/netbird/pull/2856/files#diff-67387c2b097eeb87387ab2575884677a2f9ee1c53b9e8674dcfa4936463c23f9 the issue is solved, or at least the route is not switching between peers anymore.

So with relay the latency is 0 (I could not see where this latency is computed until now), and before this change the score was 0 and the route was flipping between peers. Now the latency is still zero but the score is not zero anymore and the route is stable

I can close it