netbirdio / netbird

Connect your devices into a secure WireGuard®-based overlay network with SSO, MFA and granular access controls.
https://netbird.io
BSD 3-Clause "New" or "Revised" License
11.26k stars 517 forks source link

Netbird interface keeps flapping up and down #2173

Open iball opened 5 months ago

iball commented 5 months ago

Describe the problem

Netbird interface keeps flapping up and down on Windows and Linux.

To Reproduce

Steps to reproduce the behavior:

  1. Install Netbird
  2. Watch it flap by constantly refreshing network interfaces at the command line.

Expected behavior

I didn't expect v28 to blow everything up. It's preventing me from getting to my services.

Are you using NetBird Cloud?

Yes.

NetBird version

27.10 on Windows but it does the same no matter which version I install. 28.2 on everything else. Tried 28.2, 28.1, 28.0, and now 27.10 on the same Windows 11 PC but it's the same all over, constantly reconnecting to peers and the WT0 interface keeps disappearing and reappearing. Same behavior on all my Netbird clients.

NetBird status -d output:

If applicable, add the `netbird status -d' command output.

No. All it shows is a list of peers but it's constantly reconnecting to them all.

pascal-fischer commented 5 months ago

Hi @iball, can you share debug logs of that peer?

lixmal commented 5 months ago

@iball this might be an issue with the network monitor. You can disable it on the command line for the time being, but please collect the debug logs.

netbird down
netbird up --network-monitor=false
BizkitX commented 5 months ago

We began experiencing the same issue once we upgraded from 27.10 to 28.2. I just tested changing the 'network monitor' to false and it appears to be stable at the moment. Unsure what function the network monitor is doing to break the connection.

JonathanHohimer commented 5 months ago

I ran the process in foreground and was pushing a lot of data through to get it to trigger. I get the following output:

2024-06-22T20:08:50-05:00 INFO client/internal/networkmonitor/monitor_windows.go:131: network monitor: neighbor 10.210.0.1 () is not reachable: unreachable

2024-06-22T20:08:50-05:00 INFO client/internal/engine.go:1476: Network monitor detected network change, restarting engine

2024-06-22T20:08:50-05:00 INFO client/internal/engine.go:252: Network monitor: stopped

And then it restarts the connection. This keeps happening even though nothing is changing even after I stopped the data transfer. With network monitor disabled this doesn't happen.

Hobby-Student commented 5 months ago

I noticed the same behavior. Updated from v0.27.10 to v0.28.2 (Windows) and did a netbird up -N

After netbird up --network-monitor=false

everything is working as intended.

raptaml commented 5 months ago

Same Here: netbird up --network-monitor=false solves the issue. Another PC is running just fine.

mlsmaycon commented 5 months ago

Hey guys, the v0.28.3 will fix the issue on Windows with network monitor.

mlsmaycon commented 5 months ago

We released v0.28.3, which fixes the issue with the network monitor on Windows. Please upgrade and enable network monitor with:

netbird down
netbird up -N
raptaml commented 5 months ago

Confirmed, works just like before. Thanks!

pyfrancoeur commented 4 months ago

Can confirm the issue is still present in 28.4. netbird up --network-monitor=false still fixes it for me though. What are the consequences of running this? Thanks!

mlsmaycon commented 4 months ago

@pyfrancoeur can you enable network monitor for a brief period and run the following command to collect some logs to help us fix the issue?

netbird down
netbird up -N
netbird -A debug for 1m

After the tests are done you can disable the monitor with:

netbird down
netbird up --network-monitor=false

Besides that, any information about your setup will be helpful, e.g., number of active interfaces, OS, and main connection type (wifi or cable)

pyfrancoeur commented 4 months ago

So far, this issue has been observed exclusively with Windows Active Directory Domain Controllers (AD DC). I have tested with Windows Server 2016, 2019, and 2022, and the problem appears to be entirely random, without any clear differentiators. Most of the servers have a single NIC, while some have two. All connections are wired.

I configured my Netbird interface to use a random port between 21820 and 33820 to avoid conflicts with dns.exe, which opens ports ranging from 49152 to 65535 (source: Microsoft Security Bulletin MS08-037). This change resolved the issue on many domain controllers; however, some continue to experience significant connection instability. It is important to note that this issue does not affect every domain controller.

I have included the debug archive as requested. Please find it attached for your review and analysis.

Thank you. netbird.debug.1112209487.zip

lixmal commented 3 months ago

Could any of you test the network monitor change from https://github.com/netbirdio/netbird/pull/2450?

  1. Grab the binary archive from https://github.com/netbirdio/netbird/actions/runs/10459247214/artifacts/1829061085

  2. Extract windows-packages.zip

  3. netbird service stop
    netbird service uninstall
  4. Move netbird.exe from the zip archive to %PROGRAMFILES%/Netbird

  5. netbird service install
    netbird service start
    netbird down
    netbird up --network-monitor=true
lixmal commented 2 months ago

This is in the 0.28.8 release. If it fixes the issue, please close.

Hobby-Student commented 2 months ago

@lixmal I have had no free time to test, but I will soon configure my Laptop accordingly and will report back.