Open m3v4 opened 3 months ago
Hi @m3v4,
the warning might be a bit misleading. Whithout knowing why the first attempt failed it seems to be able to authenticate on retry and successfully connect to management and receive information about the other peers in the network so thats good so far.
The issue seems to occur in the connection establishment. Is this machine in a physically different location than the working ones? Or maybe behind a different firewall. When checking the status output it shows issues connecting to turn which will cause the peer to only be able to connect to peers using P2P, as relay (which would be fallback) is not possible. If P2P is not possible (can be for multiple reasons) and it is unable to fall back to relay this could cause the connections to be stuck in connecting state.
Hi @pascal-fischer thanks for very quick response. That is right, the machine is in phisically different location. 3 that are working fine are in the same physical location and one other that is someplace else was working yesterday, but today does not...
All 4 machines are in a group with policy allowing all protocols bidirectionally like so:
group <=> group
I've tried to remove all peers from the grup and add again, but nothing changed.
What else can I do?
EDIT: we have deleted that other formerly working peer and reauthorised it with SSO - now it works again. I have tried to use same method on that problematic peer, but this time the issue persists. So again we have 3 local and 1 remote working fine, and one remote not working.
I am pretty sure this is unrelated to the configurations within netbird itself. It is related to the setup of the selfhosted management in specific TURN and STUN servers in combination with the physical network of the peer that does not work. You need to make sure that STUN as well as TURN are reachabe and working from that location.
Relays: [stun:net.anon-ZTM28.domain:3478] is Available [turn:net.anon-ZTM28.domain:3478?transport=udp] is Unavailable, reason: allocate: attribute not found
If either one of them is not reachable this will cause issues.
You said that the other machines are in the same physical location, this means they are most likely connected P2P? So there might even be a general issue with the TURN server.
EDIT: The peer that was previously not working but is working after reauthentication. Can you send a netbird status -d output from that peer?
sure, here you go:
Peers detail: laptop-dell-mariusza.netbird.selfhosted: NetBird IP: 100.103.44.19 Public key: YvVW8g9sDDcUNhigOOW2SlIZBHj5Lj//mfMP2WAgzkg= Status: Connected -- detail -- Connection type: P2P Direct: true ICE candidate (Local/Remote): srflx/srflx ICE candidate endpoints (Local/Remote): 198.51.100.0:51820/198.51.100.1:51820 Last connection update: 20 minutes, 47 seconds ago Last WireGuard handshake: 41 seconds ago Transfer status (received/sent) 2.5 KiB/3.2 KiB Quantum resistance: false Routes: - Latency: 14.9644ms
desktop-n07vu1e.netbird.selfhosted: NetBird IP: 100.103.50.131 Public key: E8Y1I9u7F8ntuKE6QU6RNPwiFXrkBadNN1HO2djDPBc= Status: Disconnected -- detail -- Connection type: P2P Direct: false ICE candidate (Local/Remote): srflx/srflx ICE candidate endpoints (Local/Remote): 198.51.100.0:51820/198.51.100.1:51820 Last connection update: Now Last WireGuard handshake: 41 seconds ago Transfer status (received/sent) 2.5 KiB/3.2 KiB Quantum resistance: false Routes: - Latency: 0s
desktop-t16jv8o.netbird.selfhosted: NetBird IP: 100.103.117.3 Public key: 4KaK7VcDiimhmCe67feJQZhLGKKKLju5Vs4vxCmxN2U= Status: Connected -- detail -- Connection type: P2P Direct: true ICE candidate (Local/Remote): srflx/srflx ICE candidate endpoints (Local/Remote): 198.51.100.0:51820/198.51.100.1:56033 Last connection update: 20 minutes, 47 seconds ago Last WireGuard handshake: 54 seconds ago Transfer status (received/sent) 3.3 KiB/2.7 KiB Quantum resistance: false Routes: - Latency: 15.2351ms
desktop-rglgsc3.netbird.selfhosted: NetBird IP: 100.103.228.235 Public key: /hJXb8Z7N3//Dmtbx40u2iUa1aGWHgkX1pDjAeClXlA= Status: Disconnected -- detail -- Connection type: P2P Direct: false ICE candidate (Local/Remote): srflx/srflx ICE candidate endpoints (Local/Remote): 198.51.100.0:51820/198.51.100.1:51820 Last connection update: - Last WireGuard handshake: 41 seconds ago Transfer status (received/sent) 2.5 KiB/3.2 KiB Quantum resistance: false Routes: - Latency: 0s
OS: windows/amd64 Daemon version: 0.27.10 CLI version: 0.27.10 Management: Connected to https://net.anon-JBpli.domain:33073 Signal: Connected to http://net.anon-JBpli.domain:10000 Relays: [stun:net.anon-JBpli.domain:3478] is Available [turn:net.anon-JBpli.domain:3478?transport=udp] is Unavailable, reason: allocate: attribute not found Nameservers: FQDN: desktop-3o2mcrk.netbird.selfhosted NetBird IP: 100.103.61.103/16 Interface type: Userspace Quantum resistance: false Routes: - Peers count: 2/4 Connected
EDIT: I have now realised that TURN is not available on all of our PEERs, or at least the ones that I have checked.
Ah perfekt! So here you have the same result. It also shows TURN unavailable:
[turn:net.anon-JBpli.domain:3478?transport=udp] is Unavailable, reason: allocate: attribute not found
And even though it is in a different physical location it still manages to establish a P2P connection thats why it is working.
So the issue lies with your TURN server setup in general. Check the TURN servers logs if it is a general issue with the server or a configuration issue.
thanks @pascal-fischer, our admin is on it testing. Could it be as simple as closed UDP ports range 49152-65535?
Yes this could be the reason.
in the end it was COTURN config (ports were open, so it was not firewall issue). I've prepared config for it (turnserver.conf
) manually and used tools like trickle-ice to test whether it's working correctly - till it did :) .
In our selfhosted implementation we are experiencing problems with a single client (out of few dozens). So far this problem has not been reproduced in our infrastructure, but we are struggling to resolve this one case.
OS: Windows 10 Pro 10.0.19045 x64 Client version: latest (0.27.10 AMD x64)
Installed with use of elevated user rights (main Administrator account) CLI command used for installation
msiexec /i netbird.msi /quiet /l netbird.log
netbird.log:
After successful installation I used netbird up with url parameters, here is the debug log bundle:
And status:
Status on other machines in the same grup shows parameter "Peers count" as "2/3" connected meaning, that this single machine doesn't connect properly, but it also can not access all other machines.
In our policies we have 3389 port open and that kind of traffic allowed inside the forementioned group. This one PC is unable to access our server thou.
Previously we hace used openvpn and wireguard on all of the forementioned machines, but only this one ha sproblems. I have tried to find any remaining "tun/tap" adapters but non were identified, not even hidden in device manager. I have also activated the Administrator account and installed with use of that, but also no joy. We have dozens of other computers in other groups with exact same policies and all seems fine elsewhere - just this one PC is causing fuss about change of vpn platform.
What else can I try and diagnose?