netbirdio / netbird

Connect your devices into a secure WireGuard®-based overlay network with SSO, MFA and granular access controls.
https://netbird.io
BSD 3-Clause "New" or "Revised" License
11.09k stars 510 forks source link

`netbird up` fails with device auth failure #1656

Open synfinatic opened 8 months ago

synfinatic commented 8 months ago

Describe the problem

Just did a netbird down followed by a netbird up on a device which was bootstrapped onto the netbird network via a setup key and it will not connect.

To Reproduce

netbird down && netbird up

Expected behavior

Connect to netbird. Don't error out with the following error:

2024-03-02T01:47:37Z WARN client/cmd/root.go:195: retrying Login to the Management service in 1.359509522s due to error rpc error: code = Unknown desc = getting device authorization flow info failed with error: context deadline exceeded
2024-03-02T01:47:49Z WARN client/cmd/root.go:195: retrying Login to the Management service in 2.133556171s due to error rpc error: code = Unknown desc = getting device authorization flow info failed with error: context deadline exceeded
Error: login backoff cycle failed: rpc error: code = Unknown desc = getting device authorization flow info failed with error: context deadline exceeded

Are you using NetBird Cloud?

Yes, using cloud.

NetBird version

0.25.7

NetBird status -d output:

netbird status -d
Daemon status: LoginFailed

Run UP command to log in with SSO (interactive login):

 netbird up

If you are running a self-hosted version and no SSO provider has been configured in your Management Server,
you can use a setup-key:

 netbird up --management-url <YOUR_MANAGEMENT_URL> --setup-key <YOUR_SETUP_KEY>

More info: https://docs.netbird.io/how-to/register-machines-using-setup-keys

Screenshots

If applicable, add screenshots to help explain your problem.

Additional context

Add any other context about the problem here.

synfinatic commented 8 months ago

updated to 0.26.2 and same problem:

$ netbird status
Daemon version: 0.26.2
CLI version: 0.26.2
Management: Disconnected, reason: rpc error: code = FailedPrecondition desc = failed connecting to Management Service : context deadline exceeded
Signal: Disconnected
Relays: 0/0 Available
FQDN:
NetBird IP: N/A
Interface type: N/A
Quantum resistance: false
Peers count: 0/0 Connected

$ netbird up
2024-03-02T02:00:13Z WARN client/cmd/root.go:204: retrying Login to the Management service in 920.398536ms due to error rpc error: code = Unknown desc = getting device authorization flow info failed with error: context deadline exceeded
2024-03-02T02:00:24Z WARN client/cmd/root.go:204: retrying Login to the Management service in 889.796141ms due to error rpc error: code = Unknown desc = getting device authorization flow info failed with error: context deadline exceeded
Error: login backoff cycle failed: rpc error: code = Unknown desc = getting device authorization flow info failed with error: context deadline exceeded
synfinatic commented 8 months ago

Looks like this was a service issue with netbird.io cloud service as it is working now? Was there a status page I should have looked for system/health status?

$ netbird status
Daemon version: 0.26.2
CLI version: 0.26.2
Management: Connected
Signal: Connected
Relays: 2/2 Available
FQDN: raspi-blue.netbird.cloud
NetBird IP: 100.93.254.165/16
Interface type: Kernel
Quantum resistance: false
Peers count: 1/4 Connected
mlsmaycon commented 8 months ago

@synfinatic can you check the logs, especially the /var/log/netbird/netbird.err?

synfinatic commented 8 months ago

Sadly, there are no logs anymore since this device is running DietPi and /var/log is on a ramdisk volume which is purged on a regular basis.

mlsmaycon commented 8 months ago

Please run the agent in the foreground:

sudo netbird service stop
sudo netbird up -F -l debug -m https://your-server-url:port
synfinatic commented 8 months ago

As I stated earlier this morning, the problem seems to have resolved itself. I'm no longer able to reproduce this issue. Should it happen again, I'll be happy to provide the logs... unfortunately the ticket template didn't ask for them and I forgot about the log purging.

However, the limited output seems to indicate an issue with the netbird.io service? Can you confirm there a service issue at that time? I opened the ticket while the problem was occurring.

WGandy commented 8 months ago

I just had what appears to the the same error with a client. "context deadline exceeded". This is with a selfhosted netbird. AND I see now that my management seems to be messed up. No peers appear when I log in. Errors from docker show: http: TLS handshake error from xxx.xxx.xxx.xxx:63817: remote error: tls: unknown certificate

The certs for the management server seems to be intact as I can log in. (those are handled by caddy) but I think there are certs for another part of this?

mlsmaycon commented 8 months ago

@WGandy is that a custom docker build?

mlsmaycon commented 8 months ago

As I stated earlier this morning, the problem seems to have resolved itself. I'm no longer able to reproduce this issue. Should it happen again, I'll be happy to provide the logs... unfortunately the ticket template didn't ask for them and I forgot about the log purging.

However, the limited output seems to indicate an issue with the netbird.io service? Can you confirm there a service issue at that time? I opened the ticket while the problem was occurring.

@synfinatic, we didn't have any issues within the timeframe from your logs. The issue could be linked to a latency between the client and management service. I've shared with you some steps in https://github.com/netbirdio/netbird/issues/1618#issuecomment-1975941729 that might help us understand the issue in detail.

WGandy commented 8 months ago

Yes, you helped me get it setup quite a while ago. It seems that the Coturn is not finding the certs. And it's probably since the Caddy container recently re-upped them. I'm wondering if perhaps we manually copied the certs to get it going when we set it up?? I'm hoping to find the time to sort through it later today. Hopefully it's just a volume mapping issue.

WGandy commented 8 months ago

Just a follow up, my failure was on account of Caddy renewing certs with a different CA than it used previously. This resulted in having the cert files located at a different path. The dashboard container was able to use the new certs but the management container did not. I manually changed the cert file names and paths in the docker compose for the management and in the management.json file. If it renews again with the opposite provider then I'll need to manually change it again. But, I think that this will be automated in a future version of Netbird.

juniormarangao commented 1 month ago

Hi!

I am facing this issue, I can't connect any client. I've installed using Advanced guide, with Authentik and Nginx Proxy Manager. I can login, shows peers page, I can create management Keys, but I cannot connect.

When I debbug shows this below

2024-09-16T12:33:03-03:00 ERRO client/internal/login.go:105: failed while getting Management Service public key: failed while getting Management Service public key
2024-09-16T12:33:03-03:00 WARN client/cmd/root.go:234: retrying Login to the Management service in 1.259689188s due to error failed while getting Management Service public key
2024-09-16T12:33:05-03:00 DEBG client/internal/login.go:93: connecting to the Management service https://vpn.example.com:443
2024-09-16T12:33:05-03:00 DEBG client/internal/login.go:63: connected to the Management service https://vpn.example.com:443
2024-09-16T12:33:05-03:00 ERRO management/client/grpc.go:287: failed while getting Management Service public key: rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 502 (Bad Gateway); transport: received unexpected content-type "text/html"

The setup is 3 VMs:

  1. Nginx Proxy Manager
  2. Authentik
  3. Netbird

All behind one Public IP, but same internal network, all reachable between each. In NPM with host pointing to netbird vm, is the ports 80, 443, and 33073 in configuration, with gRPC, etc...

The other ports required, is forwarding directly to the netbird VM.

There are something that I missed?