netbirdio / netbird

Connect your devices into a secure WireGuard®-based overlay network with SSO, MFA and granular access controls.
https://netbird.io
BSD 3-Clause "New" or "Revised" License
11.12k stars 512 forks source link

Clients are connected but sometimes don't use routes #2514

Open WortmannImpleco opened 2 months ago

WortmannImpleco commented 2 months ago

Describe the problem

We use Netbird in our Org with around 45 users. It is selfhosted with azure auth (that works great). The Mac clients have the most problems and don't wake up after sleepmode (there is already a bug ticket for that), but still show that they are connected. But the Windows Clients have a similar problem. They often simply don't use the routes, but they are visible and active on the client. They work after reconnecting and using the incognito tab. Clearing the cache every time is no option. We have configured an exit node for some clients for all traffic but they simply are not used many times by the browser (edge, chrome and safari). Some people started reinstalling the client every time, because even down and up doesn't work. The Linux client seems to be the only one without any problems.

I really love the system and want to use it, but management already starts to hate it, and I would really love to fix that. All clients are up to date, everything seems to work except the simple connection itself. It works often, but just stops. I have tried every tip I found in the Issues and update as soon as a new version comes out (and read the changelog), but the problems keep appearing.

To Reproduce

Steps to reproduce the behavior:

  1. Install Netbird
  2. Use Netbird for some time
  3. Routes stop being used without changing anything

Expected behavior The Netbird Clients should use the configured routes unless it is deactivated, the clients should reconnect after sleep mode. If a full-tunnel vpn is configured via an exit node all traffic should go through that, without any leaks.

Are you using NetBird Cloud? No, selfhosted

NetBird version Always uptoDate, 0.28.9

afvbozzo commented 2 months ago

We are experiencing the same issue on Windows 11 clients using the Saas Hosting. netbird status -d shows that peers are connected but no more WireGuard handshakes seems to be made after about one minute from the Last connection update.

Apparently disabling Quantum resistance on the peers seems to solve the issue.

WortmannImpleco commented 2 months ago

I already tried it with activated and deactivated Quantum Resistence and Network Monitor. The Network Monitor already improved the experience on Windows Clients but it keeps happening.

For the Macs the obvious bug after going to sleepmode seems to be the main problem, but all clients except the linux systems lose connections after some time even though the network is stable and it shows that it's connected. And sometimes it still works in an incognito tab oder cleared cache browser, but that's no option. The browser does seem to forget that it should use the provided route.

hmica commented 2 months ago

I am also losing my routes. I have a VPS to check my infrastructure, but it loses routes three times a week.

This is what appears in the logs when it happens:

OPrXKmXCMxZSe2gzzHXXXXX=, endpoint address: 85.xx.xx.xx:51820
2024-09-06T04:53:29+02:00 WARN signal/client/grpc.go:160: disconnected from the Signal service but will retry silently. Reason: rpc error: code = Internal desc = server closed the stream without sending trailers
2024-09-06T04:53:33+02:00 INFO signal/client/grpc.go:147: connected to the Signal Service stream
2024-09-06T04:59:21+02:00 INFO client/internal/peer/conn.go:362: connected to peer nav0xxxxxxxxxxxxx

Both clients are running on Linux: version 0.28.9 - SaaS version.

Rob787 commented 1 month ago

Same here on Windows, network routes seems to dissappear after a while. Changing then something server side in the routes (like active/deactive on route) seems to trigger a client update and network routes appear again.

Rob787 commented 1 month ago

@mlsmaycon Any information you need that can help troubleshoot?