Open synfinatic opened 9 months ago
Hi, could you run some tests, please? While you are connected to management can you run the following commands and send me the output:
sudo ls -l /var/lib/netbird
sudo cat /var/lib/netbird/manager
sudo ls -l /var/lib/netbird/resolv.conf
sudo ls -l /etc/resolv*
Thanks!
root@raspi-blue:~# ls -l /var/lib/netbird
total 8
-rw-r--r-- 1 root root 19 Feb 28 01:33 manager
-rw-r--r-- 1 root root 79 Feb 28 01:33 resolv.conf
root@raspi-blue:~#
root@raspi-blue:~#
root@raspi-blue:~# cat /var/lib/netbird/manager
file,100.93.254.165root@raspi-blue:~#
root@raspi-blue:~#
root@raspi-blue:~# ls -l /var/lib/netbird/resolv.conf
-rw-r--r-- 1 root root 79 Feb 28 01:33 /var/lib/netbird/resolv.conf
root@raspi-blue:~#
root@raspi-blue:~#
root@raspi-blue:~# ls -l /etc/resolv*
-rw-r--r-- 1 root root 217 Feb 28 01:33 /etc/resolv.conf
-rw-r--r-- 1 root root 79 Feb 28 01:25 /etc/resolv.conf.original.netbird
I'm currently having the exact same error on Linux machines when i try to use a Setup Key. Windows PCs can login with SSO, but takes a while and a couple of reconnects.
I'm running self-hosted version where everything but the Reverse-Proxy runs in docker containers.
We use the same URL & Port for both Management
and Admin
URLs, is that a problem?
I Use nginx as reverse-proxy with the following configuration:
upstream dashboard {
server 127.0.0.1:8180;
keepalive 10;
}
upstream signal {
server 127.0.0.1:8100;
}
upstream api {
server 127.0.0.1:8443;
}
upstream management {
server 127.0.0.1:8443;
}
server {
listen 80;
server_name _;
# 301 redirect to HTTPS
location / {
return 301 https://$host$request_uri;
}
}
server {
# HTTPS server config
listen 443 ssl http2;
server_name _;
access_log /var/log/nginx/access.log;
client_header_timeout 1d;
client_body_timeout 1d;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Scheme $scheme;
proxy_set_header X-Forwarded-Proto https;
proxy_set_header X-Forwarded-Host $host;
# Proxy dashboard
location / {
proxy_pass http://dashboard;
}
# Proxy Signal
location /signalexchange.SignalExchange/ {
grpc_pass grpc://signal;
#grpc_ssl_verify off;
grpc_read_timeout 1d;
grpc_send_timeout 1d;
grpc_socket_keepalive on;
}
# Proxy Management http endpoint
location /api {
proxy_pass http://api;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Scheme $scheme;
proxy_set_header X-Forwarded-Proto https;
proxy_set_header X-Forwarded-Host $host;
proxy_http_version 1.1;
}
# Proxy Management grpc endpoint
location /management.ManagementService/ {
grpc_pass grpc://management;
#grpc_ssl_verify off;
grpc_read_timeout 1d;
grpc_send_timeout 1d;
grpc_socket_keepalive on;
}
ssl_certificate /etc/nginx/certs/cert.crt;
ssl_certificate_key /etc/nginx/certs/cert.key;
I doubt that it matters, but my device is also using a setup key and not SSO.
Mine does not come up at all (new client) so it's not the exact same scenario, but I'm thinking it has something to do with Linux Client and Setup Keys.
Windows Clients seems to be working as normal, the take a good time to connect which I'm also investigating.
Hello @synfinatic @teoder, The context deadline error indicates some timeout when communicating with the management service. The client have a timeout of 5 seconds which will be increased to 10 seconds with the next release (0.26.3).
Can you send the output from:
curl -o /dev/null -s -w "Time Connect: %{time_connect}s\nTime Start Transfer: %{time_starttransfer}s\nTotal Time: %{time_total}s\n" https://api.netbird.io/api/users
@teoder, replace https://api.netbird.io with your self-hosted management URL.
Also, can you please send us the logs from the daemon process? see https://docs.netbird.io/how-to/troubleshooting-client#getting-client-logs for reference.
$ curl -o /dev/null -s -w "Time Connect: %{time_connect}s\nTime Start Transfer: %{time_starttransfer}s\nTotal Time: %{time_total}s\n" https://api.netbird.io/api/users
Time Connect: 0.084451s
Time Start Transfer: 0.426194s
Total Time: 0.426519s
Hi @mlsmaycon I have both from Windows and Linux. Windows client can connect with SSO but takes a while, The Linux client does not connect and I dont get much output from the logs.
I haven't tried using SSO for Linux, only the Setup Key since it is supposed to be used as a routing-peer only.
From Windows Client (WSL):
# curl -o /dev/null -s -w "Time Connect: %{time_connect}s\nTime Start Transfer: %{time_starttransfer}s\nTotal Time: %{time_total}s\n" https:/secret.url.nu/api/users
Time Connect: 0.041514s
Time Start Transfer: 0.000000s
Total Time: 0.064417s
From non-working Linux box:
# curl -o /dev/null -s -w "Time Connect: %{time_connect}s\nTime Start Transfer: %{time_starttransfer}s\nTotal Time: %{time_total}s\n" https:/secret.url.nu/api/users
Time Connect: 0.015209s
Time Start Transfer: 0.000000s
Total Time: 0.067473s
Linux Client Logs give me the following, tried netbird up/login and just to start the service:
2024-03-05T11:58:48+01:00 INFO client/cmd/service_controller.go:24: starting Netbird service
2024-03-05T11:58:48+01:00 INFO client/cmd/service_controller.go:64: started daemon server: /var/run/netbird.sock
2024-03-05T11:58:48+01:00 INFO client/internal/connect.go:96: starting NetBird client version 0.26.2
2024-03-05T11:58:48+01:00 DEBG client/internal/connect.go:157: connecting to the Management service secret.url.nu:443
2024-03-05T11:58:53+01:00 ERRO management/client/grpc.go:64: failed creating connection to Management Service context deadline exceeded
2024-03-05T11:58:55+01:00 DEBG client/internal/connect.go:157: connecting to the Management service secret.url.nu:443
2024-03-05T11:59:00+01:00 ERRO management/client/grpc.go:64: failed creating connection to Management Service context deadline exceeded
2024-03-05T11:59:03+01:00 DEBG client/internal/connect.go:157: connecting to the Management service secret.url.nu:443
2024-03-05T11:59:08+01:00 ERRO management/client/grpc.go:64: failed creating connection to Management Service context deadline exceeded
2024-03-05T11:59:09+01:00 DEBG client/internal/connect.go:157: connecting to the Management service secret.url.nu:443
2024-03-05T11:59:14+01:00 ERRO management/client/grpc.go:64: failed creating connection to Management Service context deadline exceeded
2024-03-05T11:59:19+01:00 DEBG client/internal/connect.go:157: connecting to the Management service secret.url.nu:443
2024-03-05T11:59:24+01:00 ERRO management/client/grpc.go:64: failed creating connection to Management Service context deadline exceeded
2024-03-05T11:59:30+01:00 DEBG client/internal/connect.go:157: connecting to the Management service secret.url.nu:443
2024-03-05T11:59:35+01:00 ERRO management/client/grpc.go:64: failed creating connection to Management Service context deadline exceeded
2024-03-05T11:59:36+01:00 DEBG client/internal/login.go:93: connecting to the Management service https://secret.url.nu:443
2024-03-05T11:59:41+01:00 ERRO management/client/grpc.go:64: failed creating connection to Management Service context deadline exceeded
2024-03-05T11:59:41+01:00 ERRO client/internal/login.go:96: failed connecting to the Management service https://secret.url.nu:443 context deadline exceeded
2024-03-05T11:59:41+01:00 ERRO client/server/server.go:139: failed login: context deadline exceeded
2024-03-05T11:59:41+01:00 DEBG client/internal/login.go:93: connecting to the Management service https://secret.url.nu:443
2024-03-05T11:59:46+01:00 ERRO management/client/grpc.go:64: failed creating connection to Management Service context deadline exceeded
2024-03-05T11:59:46+01:00 ERRO client/internal/login.go:96: failed connecting to the Management service https://secret.url.nu:443 context deadline exceeded
2024-03-05T11:59:46+01:00 ERRO client/server/server.go:139: failed login: context deadline exceeded
2024-03-05T11:59:48+01:00 DEBG client/internal/login.go:93: connecting to the Management service https://secret.url.nu:443
Hello @synfinatic @teoder, The context deadline error indicates some timeout when communicating with the management service. The client have a timeout of 5 seconds which will be increased to 10 seconds with the next release (0.26.3).
Can you send the output from:
curl -o /dev/null -s -w "Time Connect: %{time_connect}s\nTime Start Transfer: %{time_starttransfer}s\nTotal Time: %{time_total}s\n" https://api.netbird.io/api/users
@teoder, replace https://api.netbird.io with your self-hosted management URL.
Also, can you please send us the logs from the daemon process? see https://docs.netbird.io/how-to/troubleshooting-client#getting-client-logs for reference.
@mlsmaycon
So I think I've narrowed my problem down to the /api part of the nginx reverse-proxy, that section is not getting any access-requests...
Do you think this could be related to me using the same fqdn for the API and Management section? I mean, I could always move this to another fqdn like api.domain.nu
No, the management service is the API. Often we see some configs missing grpc_pass parameters for the management protocol.
Can you share your nginx configuration?
No, the management service is the API. Often we see some configs missing grpc_pass parameters for the management protocol.
Can you share your nginx configuration?
@mlsmaycon Here it is:
upstream dashboard {
server 127.0.0.1:8180;
keepalive 10;
}
upstream signal {
server 127.0.0.1:8100;
}
upstream management {
server 127.0.0.1:8380;
}
server {
listen 80;
server_name test.url.com;
# 301 redirect to HTTPS
location / {
return 301 https://$host$request_uri;
}
}
server {
# HTTPS server config
listen 443 ssl http2;
server_name test.url.com;
access_log /var/log/nginx/access.log;
error_log /var/log/nginx/error.log;
# This is necessary so that grpc connections do not get closed early
# see https://stackoverflow.com/a/67805465
client_header_timeout 1d;
client_body_timeout 1d;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Scheme $scheme;
proxy_set_header X-Forwarded-Proto https;
proxy_set_header X-Forwarded-Host $host;
grpc_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
# Proxy dashboard
location / {
proxy_pass http://dashboard;
}
# Proxy Signal
location /signalexchange.SignalExchange/ {
access_log /var/log/nginx/signal_logging.log upstream_logging;
error_log /var/log/nginx/signal_error_logging.log;
grpc_pass grpc://signal;
grpc_read_timeout 1d;
grpc_send_timeout 1d;
grpc_socket_keepalive on;
}
# Proxy Management http endpoint
location /api {
proxy_pass http://management;
}
# Proxy Management grpc endpoint
location /management.ManagementService/ {
grpc_pass grpc://management;
grpc_read_timeout 1d;
grpc_send_timeout 1d;
grpc_socket_keepalive on;
}
ssl_certificate /etc/nginx/certs/cert.crt;
ssl_certificate_key /etc/nginx/certs/cert.key;
}
So, In my case this problem was due to untrusted Let's Encrypt root certificate on the Linux host.
I added the certificates to /etc/ssl/certs/ca-certificates.crt
and I could connect without issues.
I ran the Client in the Foreground sudo bash -c 'GRPC_GO_LOG_VERBOSITY_LEVEL=99 GRPC_GO_LOG_SEVERITY_LEVEL=info netbird up -F -l debug'
and could see that there was a certificate issue.
Describe the problem
Ran
netbird down
followed bynetbird up
on a RasPi running Linux/Debian 12. theup
command failed with the errors:I diagnosed the root cause for this as being
netbird up
modified the/etc/resolv.conf
file, butnetbird down
did not restore the original list ofnameserver
entries. Basically, the NetBird DNS server is not available when NetBird is down and so DNS resolution is failing. Manually editing the file and commenting out the line readingnameserver 100.93.254.165
fixed the issue.To Reproduce
See above.
Expected behavior
netbird up
succeedsAre you using NetBird Cloud?
Yes.
NetBird version
0.25.7
NetBird status -d output:
If applicable, add the `netbird status -d' command output.
Screenshots
If applicable, add screenshots to help explain your problem.
Additional context
Add any other context about the problem here.