netbirdio / netbird

Connect your devices into a single secure private WireGuard®-based mesh network with SSO/MFA and simple access controls.
https://netbird.io
BSD 3-Clause "New" or "Revised" License
9.83k stars 425 forks source link

New peers are not propagated to already existing peers #2114

Open glaeqen opened 3 weeks ago

glaeqen commented 3 weeks ago

Describe the problem

After adding a new peer to the network, this peer is not getting propagated to the already existing peers. At best only peers that were added very recently (last 24h?) seem to get the list of peers updated. The only thing that helps is to netbird down and netbird up on all the affected, exisiting peers machines which forces a peer refresh. Existing peers seem to show a following error after a new peer was added

2024-06-10T15:57:29+02:00 ERRO signal/client/grpc.go:410: error while handling message of Peer [key: <KEY>] error: [wrongly addressed message <KEY>]

This feels so fundamentally wrong that it's hard to believe that it's a bug and not just my fault. I doubt it's a networking issue, affected peers are located both in the cloud and self-hosted networks so it does not seem probable.

Expected behavior

Automatic peer propagation

Are you using NetBird Cloud?

self-hosted NetBird's control plane

NetBird version

0.27.10

mlsmaycon commented 3 weeks ago

This is a very odd indeed. Assuming your connection with the management is fine this should never happen unless there is an manual update of the network map in the database or a async update causing the serial number to be lower.

If you are running on 0.27.5+, can you enable debug logs on one of the existing clients with: netbird debug log level debug

then add a new peer and check if there is a log similar to:

received outdated NetworkMap with serial XXXX, ignoring
glaeqen commented 3 weeks ago

It occured twice earlier this week but I cannot reproduce it since 🙄 When it happened I happened to test that when I updated ACL (I added an affected peer to some new, arbitrary group), netbird client of that peer noticed the ACL update request and as a side-effect it also did update the peer list. If it happens again I'll try to gather more detailed logs of all parties involved.

mlsmaycon commented 3 weeks ago

sounds great, thanks @glaeqen