Previously working direct peer connections sometimes fall back to being relayed, possibly after waking from sleep. When it happens, netbird down then netbird up is enough to get the peer connections to be direct again.
To Reproduce
While I can't reliably reproduce it yet, logging seems to show it happening when waking from sleep. There may be a minimum sleep period before it happens, and it might not happen on every wake.
Expected behavior
Direct peer connections should be reestablished.
Are you using NetBird Cloud?
Yes
NetBird version
0.28.9 on Debian Bookworm
NetBird status -dA output:
Note the two relayed peers. After bringing netbird down and up again they go back to being direct.
$ netbird status -dA
Peers detail:
<snip>.netbird.cloud:
NetBird IP: 100.101.136.51
Public key: 9+IH6A3QhPDQcdiJFxhfYMa3ZD8vBTEh//E4C6VcyEE=
Status: Disconnected
-- detail --
Connection type: Relayed
Direct: false
ICE candidate (Local/Remote): relay/prflx
ICE candidate endpoints (Local/Remote): 198.51.100.0:49764/198.51.100.1:1169
Last connection update: -
Last WireGuard handshake: 2 minutes ago
Transfer status (received/sent) 540 B/756 B
Quantum resistance: false
Routes: -
Latency: 0s
<snip>.netbird.cloud:
NetBird IP: 100.101.163.14
Public key: uMGErvJAyflc7PkSVttAXkZB5Ga09qFKJ4ThQTL+5VE=
Status: Connected
-- detail --
Connection type: Relayed
Direct: false
ICE candidate (Local/Remote): relay/prflx
ICE candidate endpoints (Local/Remote): 198.51.100.0:49764/198.51.100.1:1169
Last connection update: 2 minutes ago
Last WireGuard handshake: 2 minutes ago
Transfer status (received/sent) 540 B/756 B
Quantum resistance: false
Routes: -
Latency: 99.422659ms
<snip>.netbird.cloud:
NetBird IP: 100.101.172.78
Public key: mVcZod0gjevW72OTOh02sqTi+iE1kS5NGVIqDwnmQnE=
Status: Connected
-- detail --
Connection type: Relayed
Direct: false
ICE candidate (Local/Remote): relay/prflx
ICE candidate endpoints (Local/Remote): 198.51.100.0:29708/198.51.100.1:1169
Last connection update: 2 minutes ago
Last WireGuard handshake: 1 minute, 44 seconds ago
Transfer status (received/sent) 3.3 KiB/3.7 KiB
Quantum resistance: false
Routes: -
Latency: 100.003558ms
<snip>.netbird.cloud:
NetBird IP: 100.101.236.244
Public key: vpxBKuF5DzcwtBjZoBDxsdBr/dQy/ptnhXSFMTPDwnE=
Status: Disconnected
-- detail --
Connection type:
Direct: false
ICE candidate (Local/Remote): -/-
ICE candidate endpoints (Local/Remote): -/-
Last connection update: -
Last WireGuard handshake: -
Transfer status (received/sent) 0 B/0 B
Quantum resistance: false
Routes: -
Latency: 0s
OS: linux/amd64
Daemon version: 0.28.9
CLI version: 0.28.9
Management: Connected to https://api.netbird.io:443
Signal: Connected to https://signal.netbird.io:443
Relays:
[stun:stun.netbird.io:5555] is Available
[turns:turn.netbird.io:443?transport=tcp] is Available
Nameservers:
FQDN: <snip>.netbird.cloud
NetBird IP: 100.101.121.214/16
Interface type: Kernel
Quantum resistance: false
Routes: -
Peers count: 2/4 Connected
Screenshots
n/a
Additional context
I'm trying to diagnose an issue where peer connections are made via a relay server when a direct connection should be possible using stun. Running netbird down then netbird up seems to be enough to get direct peer connections established again when it happens.
Netbird is running inside a Debian Bookworm VM, and I think the issues occurs when the host machine wakes from sleep and peer connections need reestablishing.
I noticed that there doesn't appear to be any mention of stun.netbird.io in the debug log when one of these relayed connections is made, almost as if the stun server isn't being used when it happens. For example a successful direct connection shows something like:
I've attached a sanitized debug client.log file that hopefully show whats happening. The two relevant remote peers are uMGErvJAyflc7PkSVttAXkZB5Ga09qFKJ4ThQTL and uMGErvJAyflc7PkSVttAXkZB5Ga09qFKJ4ThQTL+5VE=. The previously quoted timestamps should hopefully show the difference between a successful (15:52:55) direct connection, and a relayed one (15:37:09) being used instead to the same peer.
The relay log entry at 15:37 above was after waking from sleep, and the direct connection at 15:52 happened after running netbird down then netbird up.
I've also got a netbird.out debug logfile with the PIONS_LOG_DEBUG=all env var set that probably shows some useful info, though haven't attached it yet as its unsanitized. I'll try to snip out and sanitize the relevant time periods and will attach them later below, but briefly of note is the following when the 15:37 post-wake connection attempt happened:
ice WARNING: 2024/09/01 15:37:07 Failed get server reflexive address udp4 stun:stun.netbird.io:5555: failed to send STUN packet: got an error while checking route, err: no route found for 3.105.133.255
There are also a couple of dozen more 'Failed to send packet' errors such as the ones below that only appear on that connection attempt, and not on the successful direct one. The errors appear nowhere else in the logfile apart from connection attempts immediately after the system wakes from sleep:
ice INFO: 2024/09/01 15:37:08 Failed to send packet: got an error while checking route, err: no route found for 192.168.1.11
ice INFO: 2024/09/01 15:37:08 Failed to send packet: got an error while checking route, err: no route found for 192.168.1.12
ice INFO: 2024/09/01 15:37:08 Failed to send packet: got an error while checking route, err: no route found for <snip>
ice INFO: 2024/09/01 15:37:08 Failed to send packet: got an error while checking route, err: no route found for 3.105.133.255
ice INFO: 2024/09/01 15:37:08 Failed to send packet: got an error while checking route, err: no route found for <snip>
ice INFO: 2024/09/01 15:37:08 Failed to send packet: got an error while checking route, err: no route found for 192.168.1.11
ice INFO: 2024/09/01 15:37:08 Failed to send packet: got an error while checking route, err: no route found for <snip>
It's almost as if access to the stun server or network connectivity is initially unavailable on waking, causing the relay to be used instead.
I don't think there's any otherwise noteworthy networking stuff going on between peers. They're on devices behind standard residential internet connections without any CGNAT. When it works, stun seems to be enough to get direct connections working (at least until the machine goes to sleep).
From what I can make out by reading the log files, Netbird doesn't try reconnecting once the relayed connection is established. I'm not sure if this is currently a thing, but perhaps occasionally trying to 'upgrade' to a direct connection from a relayed one might be worth a feature request in future. While it might not be the cause of this issue, it could possibly work around it.
Here's the sanitized ICE debug logs showing the difference between the relayed peer connections after waking and the direct peer connections after running netbird down then netbird up
ICE debug logging for relayed connection following wake
```
ice INFO: 2024/09/01 15:29:38 Setting new connection state: Disconnected
ice INFO: 2024/09/01 15:29:38 Setting new connection state: Disconnected
turnc DEBUG: 15:36:55.194664 allocation.go:77: Send refresh request (dontWait=true)
turnc DEBUG: 15:36:55.194787 allocation.go:77: Send refresh request (dontWait=true)
turnc DEBUG: 15:36:55.308408 allocation.go:84: Refresh request sent
turnc DEBUG: 15:36:55.308444 allocation.go:84: Refresh request sent
ice WARNING: 2024/09/01 15:36:55 Failed to read from candidate udp4 relay 198.51.100.2:52512 related 192.168.1.4:35602: read udp 198.51.100.2:52512: use of closed network connection
ice WARNING: 2024/09/01 15:36:55 Failed to read from candidate udp4 relay 198.51.100.3:23223 related 192.168.1.4:36280: read udp 198.51.100.3:23223: use of closed network connection
ice INFO: 2024/09/01 15:36:55 Setting new connection state: Closed
ice INFO: 2024/09/01 15:36:55 Setting new connection state: Closed
turnc DEBUG: 15:36:55.352509 client.go:177: Failed to read: read tcp4 192.168.1.4:36280->198.51.100.3:443: use of closed network connection. Exiting loop
turnc DEBUG: 15:36:55.368160 client.go:177: Failed to read: read tcp4 192.168.1.4:35602->198.51.100.2:443: use of closed network connection. Exiting loop
ice INFO: 2024/09/01 15:37:00 Setting new connection state: Closed
ice DEBUG: 15:37:07.164992 agent.go:399: Started agent: isControlling? true, remoteUfrag: "", remotePwd: ""
ice INFO: 2024/09/01 15:37:07 Setting new connection state: Checking
ice WARNING: 2024/09/01 15:37:07 Failed to ping without candidate pairs. Connection is not possible yet.
ice WARNING: 2024/09/01 15:37:07 Failed to ping without candidate pairs. Connection is not possible yet.
ice WARNING: 2024/09/01 15:37:07 Failed get server reflexive address udp4 turns:turn.netbird.io:443?transport=tcp: failed to send STUN packet: got an error while checking route, err: no route found for 198.51.100.3
ice WARNING: 2024/09/01 15:37:07 Failed get server reflexive address udp4 stun:stun.netbird.io:5555: failed to send STUN packet: got an error while checking route, err: no route found for 198.51.100.2
ice DEBUG: 15:37:07.309001 gather.go:403: Failed to resolve STUN host: stun.netbird.io:5555: lookup stun.netbird.io on 127.0.0.53:53: no such host
ice DEBUG: 15:37:07.317823 gather.go:403: Failed to resolve STUN host: turn.netbird.io:443: lookup turn.netbird.io on 127.0.0.53:53: no such host
ice WARNING: 2024/09/01 15:37:07 Failed to ping without candidate pairs. Connection is not possible yet.
turnc DEBUG: 15:37:07.410620 client.go:119: Resolved TURN server turn.netbird.io:443 to 198.51.100.3:443
ice INFO: 2024/09/01 15:37:07 Failed to send packet: got an error while checking route, err: no route found for 192.168.1.12
turnc DEBUG: 15:37:07.499174 udp_conn.go:66: Initial lifetime: 600 seconds
turnc DEBUG: 15:37:07.499213 udp_conn.go:81: Started refresh allocation timer
turnc DEBUG: 15:37:07.499220 udp_conn.go:84: Started refresh permission timer
ice INFO: 2024/09/01 15:37:07 Failed to send packet: got an error while checking route, err: no route found for 192.168.1.12
turnc DEBUG: 15:37:07.588601 udp_conn.go:438: Channel binding successful: 192.168.1.12:51820 16384
ice INFO: 2024/09/01 15:37:07 Failed to send packet: got an error while checking route, err: no route found for 192.168.1.12
ice INFO: 2024/09/01 15:37:07 Failed to send packet: got an error while checking route, err: no route found for 198.51.100.0
turnc DEBUG: 15:37:07.685394 udp_conn.go:438: Channel binding successful: 198.51.100.0:51820 16385
ice DEBUG: 15:37:07.700722 agent.go:399: Started agent: isControlling? true, remoteUfrag: "", remotePwd: ""
ice INFO: 2024/09/01 15:37:07 Setting new connection state: Checking
ice WARNING: 2024/09/01 15:37:07 Failed to ping without candidate pairs. Connection is not possible yet.
ice WARNING: 2024/09/01 15:37:07 Failed to ping without candidate pairs. Connection is not possible yet.
ice WARNING: 2024/09/01 15:37:07 Failed get server reflexive address udp4 stun:stun.netbird.io:5555: failed to send STUN packet: got an error while checking route, err: no route found for 198.51.100.2
ice WARNING: 2024/09/01 15:37:07 Failed get server reflexive address udp4 turns:turn.netbird.io:443?transport=tcp: failed to send STUN packet: got an error while checking route, err: no route found for 198.51.100.3
ice DEBUG: 15:37:07.706227 gather.go:403: Failed to resolve STUN host: stun.netbird.io:5555: lookup stun.netbird.io on 127.0.0.53:53: no such host
ice DEBUG: 15:37:07.706332 gather.go:403: Failed to resolve STUN host: turn.netbird.io:443: lookup turn.netbird.io on 127.0.0.53:53: no such host
ice INFO: 2024/09/01 15:37:07 Failed to send packet: got an error while checking route, err: no route found for 192.168.1.12
ice INFO: 2024/09/01 15:37:07 Failed to send packet: got an error while checking route, err: no route found for 198.51.100.0
ice INFO: 2024/09/01 15:37:07 Failed to send packet: got an error while checking route, err: no route found for 198.51.100.2
turnc DEBUG: 15:37:07.795878 client.go:119: Resolved TURN server turn.netbird.io:443 to 198.51.100.3:443
turnc DEBUG: 15:37:07.824046 udp_conn.go:438: Channel binding successful: 198.51.100.2:41887 16386
ice DEBUG: 15:37:07.866923 agent.go:1138: Adding a new peer-reflexive candidate: 198.51.100.0:1166
ice INFO: 2024/09/01 15:37:07 Failed to send packet: got an error while checking route, err: no route found for 192.168.1.12
ice INFO: 2024/09/01 15:37:07 Failed to send packet: got an error while checking route, err: no route found for 198.51.100.0
ice INFO: 2024/09/01 15:37:07 Failed to send packet: got an error while checking route, err: no route found for 198.51.100.2
ice INFO: 2024/09/01 15:37:07 Failed to send packet: got an error while checking route, err: no route found for 198.51.100.0
turnc DEBUG: 15:37:07.886302 udp_conn.go:66: Initial lifetime: 600 seconds
turnc DEBUG: 15:37:07.886328 udp_conn.go:81: Started refresh allocation timer
turnc DEBUG: 15:37:07.886333 udp_conn.go:84: Started refresh permission timer
ice WARNING: 2024/09/01 15:37:07 Failed to ping without candidate pairs. Connection is not possible yet.
turnc DEBUG: 15:37:07.911582 udp_conn.go:438: Channel binding successful: 198.51.100.0:1166 16387
ice WARNING: 2024/09/01 15:37:07 Discard message from (udp4 prflx 198.51.100.0:1166 related :0), unknown TransactionID 0x68fad0b96128d35803e714e6
ice INFO: 2024/09/01 15:37:08 Failed to send packet: got an error while checking route, err: no route found for 192.168.1.11
ice INFO: 2024/09/01 15:37:08 Failed to send packet: got an error while checking route, err: no route found for 192.168.1.12
ice INFO: 2024/09/01 15:37:08 Failed to send packet: got an error while checking route, err: no route found for 198.51.100.0
ice INFO: 2024/09/01 15:37:08 Failed to send packet: got an error while checking route, err: no route found for 198.51.100.2
ice INFO: 2024/09/01 15:37:08 Failed to send packet: got an error while checking route, err: no route found for 198.51.100.0
ice INFO: 2024/09/01 15:37:08 Failed to send packet: got an error while checking route, err: no route found for 192.168.1.11
ice INFO: 2024/09/01 15:37:08 Failed to send packet: got an error while checking route, err: no route found for 198.51.100.0
turnc DEBUG: 15:37:08.115789 udp_conn.go:438: Channel binding successful: 192.168.1.11:51820 16384
ice INFO: 2024/09/01 15:37:08 Failed to send packet: got an error while checking route, err: no route found for 192.168.1.11
ice INFO: 2024/09/01 15:37:08 Failed to send packet: got an error while checking route, err: no route found for 198.51.100.0
ice INFO: 2024/09/01 15:37:08 Failed to send packet: got an error while checking route, err: no route found for 198.51.100.0
turnc DEBUG: 15:37:08.175488 udp_conn.go:438: Channel binding successful: 198.51.100.0:51820 16385
turnc DEBUG: 15:37:08.175853 udp_conn.go:438: Channel binding successful: 198.51.100.0:1166 16386
ice INFO: 2024/09/01 15:37:08 Failed to send packet: got an error while checking route, err: no route found for 192.168.1.11
ice INFO: 2024/09/01 15:37:08 Failed to send packet: got an error while checking route, err: no route found for 198.51.100.0
ice INFO: 2024/09/01 15:37:08 Failed to send packet: got an error while checking route, err: no route found for 198.51.100.0
ice INFO: 2024/09/01 15:37:08 Failed to send packet: got an error while checking route, err: no route found for 198.51.100.2
ice INFO: 2024/09/01 15:37:08 Failed to send packet: got an error while checking route, err: no route found for 192.168.1.12
ice INFO: 2024/09/01 15:37:08 Failed to send packet: got an error while checking route, err: no route found for 198.51.100.0
ice INFO: 2024/09/01 15:37:08 Failed to send packet: got an error while checking route, err: no route found for 198.51.100.2
ice INFO: 2024/09/01 15:37:08 Failed to send packet: got an error while checking route, err: no route found for 198.51.100.0
turnc DEBUG: 15:37:08.321803 udp_conn.go:438: Channel binding successful: 198.51.100.2:55475 16387
ice INFO: 2024/09/01 15:37:08 Failed to send packet: got an error while checking route, err: no route found for 192.168.1.12
ice INFO: 2024/09/01 15:37:08 Failed to send packet: got an error while checking route, err: no route found for 198.51.100.0
ice INFO: 2024/09/01 15:37:08 Failed to send packet: got an error while checking route, err: no route found for 198.51.100.2
ice INFO: 2024/09/01 15:37:08 Failed to send packet: got an error while checking route, err: no route found for 198.51.100.0
ice INFO: 2024/09/01 15:37:08 Failed to send packet: got an error while checking route, err: no route found for 192.168.1.11
ice INFO: 2024/09/01 15:37:08 Failed to send packet: got an error while checking route, err: no route found for 198.51.100.0
ice INFO: 2024/09/01 15:37:08 Failed to send packet: got an error while checking route, err: no route found for 198.51.100.0
ice INFO: 2024/09/01 15:37:08 Failed to send packet: got an error while checking route, err: no route found for 198.51.100.2
ice INFO: 2024/09/01 15:37:08 Failed to send packet: got an error while checking route, err: no route found for 198.51.100.0
ice INFO: 2024/09/01 15:37:08 Failed to send packet: got an error while checking route, err: no route found for 198.51.100.2
ice INFO: 2024/09/01 15:37:08 Failed to send packet: got an error while checking route, err: no route found for 198.51.100.0
ice INFO: 2024/09/01 15:37:08 Failed to send packet: got an error while checking route, err: no route found for 192.168.1.11
ice INFO: 2024/09/01 15:37:08 Failed to send packet: got an error while checking route, err: no route found for 198.51.100.0
ice INFO: 2024/09/01 15:37:08 Failed to send packet: got an error while checking route, err: no route found for 198.51.100.0
ice INFO: 2024/09/01 15:37:08 Failed to send packet: got an error while checking route, err: no route found for 198.51.100.2
ice DEBUG: 15:37:08.811376 agent.go:1138: Adding a new peer-reflexive candidate: 198.51.100.0:1167
ice INFO: 2024/09/01 15:37:08 Failed to send packet: got an error while checking route, err: no route found for 192.168.1.11
ice INFO: 2024/09/01 15:37:08 Failed to send packet: got an error while checking route, err: no route found for 198.51.100.0
ice INFO: 2024/09/01 15:37:08 Failed to send packet: got an error while checking route, err: no route found for 198.51.100.0
ice INFO: 2024/09/01 15:37:08 Failed to send packet: got an error while checking route, err: no route found for 198.51.100.2
ice INFO: 2024/09/01 15:37:08 Failed to send packet: got an error while checking route, err: no route found for 198.51.100.0
turnc DEBUG: 15:37:08.856768 udp_conn.go:438: Channel binding successful: 198.51.100.0:1167 16388
ice INFO: 2024/09/01 15:37:08 Failed to send packet: got an error while checking route, err: no route found for 198.51.100.0
ice INFO: 2024/09/01 15:37:08 Failed to send packet: got an error while checking route, err: no route found for 198.51.100.2
ice INFO: 2024/09/01 15:37:08 Failed to send packet: got an error while checking route, err: no route found for 198.51.100.0
ice WARNING: 2024/09/01 15:37:08 Discard message from (udp4 prflx 198.51.100.0:1167 related :0), unknown TransactionID 0xfb5f477aa23a474d73ea08c0
ice INFO: 2024/09/01 15:37:09 Failed to send packet: got an error while checking route, err: no route found for 192.168.1.11
ice INFO: 2024/09/01 15:37:09 Failed to send packet: got an error while checking route, err: no route found for 198.51.100.0
ice INFO: 2024/09/01 15:37:09 Failed to send packet: got an error while checking route, err: no route found for 198.51.100.0
ice INFO: 2024/09/01 15:37:09 Failed to send packet: got an error while checking route, err: no route found for 198.51.100.2
ice INFO: 2024/09/01 15:37:09 Failed to send packet: got an error while checking route, err: no route found for 198.51.100.0
ice INFO: 2024/09/01 15:37:09 Failed to send packet: got an error while checking route, err: no route found for 198.51.100.2
ice INFO: 2024/09/01 15:37:09 Failed to send packet: got an error while checking route, err: no route found for 198.51.100.0
ice INFO: 2024/09/01 15:37:09 Failed to send packet: got an error while checking route, err: no route found for 198.51.100.0
ice INFO: 2024/09/01 15:37:09 Failed to send packet: got an error while checking route, err: no route found for 198.51.100.0
ice INFO: 2024/09/01 15:37:09 Failed to send packet: got an error while checking route, err: no route found for 198.51.100.2
ice INFO: 2024/09/01 15:37:09 Failed to send packet: got an error while checking route, err: no route found for 198.51.100.0
ice INFO: 2024/09/01 15:37:09 Setting new connection state: Connected
ice WARNING: 2024/09/01 15:37:09 Discard message from (udp4 prflx 198.51.100.0:1166 related :0), unknown TransactionID 0xfe75ac1cb04aedc8df03cf31
ice INFO: 2024/09/01 15:37:09 Failed to send packet: got an error while checking route, err: no route found for 198.51.100.0
ice INFO: 2024/09/01 15:37:09 Failed to send packet: got an error while checking route, err: no route found for 198.51.100.2
ice INFO: 2024/09/01 15:37:09 Failed to send packet: got an error while checking route, err: no route found for 198.51.100.0
ice INFO: 2024/09/01 15:37:09 Failed to send packet: got an error while checking route, err: no route found for 198.51.100.2
ice INFO: 2024/09/01 15:37:09 Failed to send packet: got an error while checking route, err: no route found for 198.51.100.0
ice INFO: 2024/09/01 15:37:09 Setting new connection state: Connected
```
ICE debug logging for direct connection
```
ice INFO: 2024/09/01 15:52:33 Setting new connection state: Closed
ice WARNING: 2024/09/01 15:52:33 Discard message from (udp4 prflx 198.51.100.0:1166 related :0), unknown TransactionID 0x1732122ebfa5d55ef62fac0b
ice WARNING: 2024/09/01 15:52:34 Discard message from (udp4 prflx 198.51.100.0:1167 related :0), unknown TransactionID 0x313944abbeee6e642cdf58c9
ice WARNING: 2024/09/01 15:52:37 Discard message from (udp4 prflx 198.51.100.0:1166 related :0), unknown TransactionID 0x01e2ff2404c868aa4646d2a2
ice INFO: 2024/09/01 15:52:38 Setting new connection state: Closed
ice INFO: 2024/09/01 15:52:38 Setting new connection state: Closed
turnc DEBUG: 15:52:38.288924 allocation.go:77: Send refresh request (dontWait=true)
turnc DEBUG: 15:52:38.288965 allocation.go:77: Send refresh request (dontWait=true)
turnc DEBUG: 15:52:38.288990 allocation.go:84: Refresh request sent
turnc DEBUG: 15:52:38.288966 allocation.go:84: Refresh request sent
ice WARNING: 2024/09/01 15:52:38 Failed to read from candidate udp4 relay 198.51.100.3:15455 related 192.168.1.4:50208: read udp 198.51.100.3:15455: use of closed network connection
turnc DEBUG: 15:52:38.289846 client.go:177: Failed to read: read tcp4 192.168.1.4:50208->198.51.100.3:443: use of closed network connection. Exiting loop
ice INFO: 2024/09/01 15:52:38 Setting new connection state: Closed
ice WARNING: 2024/09/01 15:52:38 Failed to read from candidate udp4 relay 198.51.100.3:56187 related 192.168.1.4:50222: read udp 198.51.100.3:56187: use of closed network connection
turnc DEBUG: 15:52:38.290288 client.go:177: Failed to read: read tcp4 192.168.1.4:50222->198.51.100.3:443: use of closed network connection. Exiting loop
ice INFO: 2024/09/01 15:52:38 Setting new connection state: Closed
ice WARNING: 2024/09/01 15:52:49 UDPMuxDefault should not listening on unspecified address, use NewMultiUDPMuxFromPort instead
ice DEBUG: 15:52:54.138817 agent.go:399: Started agent: isControlling? true, remoteUfrag: "", remotePwd: ""
ice INFO: 2024/09/01 15:52:54 Setting new connection state: Checking
ice WARNING: 2024/09/01 15:52:54 Failed to ping without candidate pairs. Connection is not possible yet.
ice WARNING: 2024/09/01 15:52:54 Failed to ping without candidate pairs. Connection is not possible yet.
ice DEBUG: 15:52:54.166560 gather.go:403: Failed to resolve STUN host: turn.netbird.io:443: lookup turn.netbird.io on 127.0.0.53:53: no such host
ice DEBUG: 15:52:54.203709 gather.go:403: Failed to resolve STUN host: stun.netbird.io:5555: lookup stun.netbird.io on 127.0.0.53:53: no such host
ice WARNING: 2024/09/01 15:52:54 Failed to ping without candidate pairs. Connection is not possible yet.
ice DEBUG: 15:52:54.330135 agent.go:820: Ignore duplicate candidate: udp4 srflx 198.51.100.1:51820 related 0.0.0.0:51820
turnc DEBUG: 15:52:54.388945 client.go:119: Resolved TURN server turn.netbird.io:443 to 198.51.100.2:443
ice WARNING: 2024/09/01 15:52:54 Failed to ping without candidate pairs. Connection is not possible yet.
turnc DEBUG: 15:52:54.494364 udp_conn.go:66: Initial lifetime: 600 seconds
turnc DEBUG: 15:52:54.494393 udp_conn.go:81: Started refresh allocation timer
turnc DEBUG: 15:52:54.494401 udp_conn.go:84: Started refresh permission timer
turnc DEBUG: 15:52:54.597427 udp_conn.go:438: Channel binding successful: 192.168.1.11:51820 16384
ice WARNING: 2024/09/01 15:52:54 Discard message from (udp4 srflx 198.51.100.0:1167 related 0.0.0.0:51820), unknown TransactionID 0xc4e172e6ea590fbdf661a688
ice WARNING: 2024/09/01 15:52:54 Discard message from (udp4 srflx 198.51.100.0:1167 related 0.0.0.0:51820), unknown TransactionID 0xc4e172e6ea590fbdf661a688
ice WARNING: 2024/09/01 15:52:54 Discard message from (udp4 srflx 198.51.100.0:1167 related 0.0.0.0:51820), unknown TransactionID 0xf5a606a2d3645f18019019ff
ice WARNING: 2024/09/01 15:52:54 Discard message from (udp4 srflx 198.51.100.0:1167 related 0.0.0.0:51820), unknown TransactionID 0xc4e172e6ea590fbdf661a688
ice WARNING: 2024/09/01 15:52:54 Discard message from (udp4 srflx 198.51.100.0:1167 related 0.0.0.0:51820), unknown TransactionID 0xf5a606a2d3645f18019019ff
ice WARNING: 2024/09/01 15:52:54 Discard message from (udp4 srflx 198.51.100.0:1167 related 0.0.0.0:51820), unknown TransactionID 0xf5a606a2d3645f18019019ff
ice WARNING: 2024/09/01 15:52:54 Discard message from (udp4 srflx 198.51.100.0:1167 related 0.0.0.0:51820), unknown TransactionID 0xaaa9ab85158cabe3660605b2
ice WARNING: 2024/09/01 15:52:54 Discard message from (udp4 srflx 198.51.100.0:1167 related 0.0.0.0:51820), unknown TransactionID 0xaaa9ab85158cabe3660605b2
ice WARNING: 2024/09/01 15:52:54 Discard message from (udp4 srflx 198.51.100.0:1167 related 0.0.0.0:51820), unknown TransactionID 0xaaa9ab85158cabe3660605b2
turnc DEBUG: 15:52:54.675908 udp_conn.go:438: Channel binding successful: 198.51.100.0:1167 16385
turnc DEBUG: 15:52:54.676034 udp_conn.go:438: Channel binding successful: 198.51.100.0:51820 16386
ice DEBUG: 15:52:54.723518 agent.go:399: Started agent: isControlling? true, remoteUfrag: "", remotePwd: ""
ice INFO: 2024/09/01 15:52:54 Setting new connection state: Checking
ice DEBUG: 15:52:54.727267 gather.go:403: Failed to resolve STUN host: turn.netbird.io:443: lookup turn.netbird.io on 127.0.0.53:53: no such host
ice DEBUG: 15:52:54.727289 gather.go:403: Failed to resolve STUN host: stun.netbird.io:5555: lookup stun.netbird.io on 127.0.0.53:53: no such host
ice DEBUG: 15:52:54.727972 agent.go:820: Ignore duplicate candidate: udp4 srflx 198.51.100.1:51820 related 0.0.0.0:51820
ice WARNING: 2024/09/01 15:52:54 Failed to ping without candidate pairs. Connection is not possible yet.
ice WARNING: 2024/09/01 15:52:54 Failed to ping without candidate pairs. Connection is not possible yet.
ice INFO: 2024/09/01 15:52:54 Setting new connection state: Connected
ice WARNING: 2024/09/01 15:52:54 Discard message from (udp4 srflx 198.51.100.0:1167 related 0.0.0.0:51820), unknown TransactionID 0xf662c588701d95cd6ff54f73
ice WARNING: 2024/09/01 15:52:54 Discard message from (udp4 srflx 198.51.100.0:1167 related 0.0.0.0:51820), unknown TransactionID 0xf662c588701d95cd6ff54f73
ice WARNING: 2024/09/01 15:52:54 Discard message from (udp4 srflx 198.51.100.0:1167 related 0.0.0.0:51820), unknown TransactionID 0xf662c588701d95cd6ff54f73
turnc DEBUG: 15:52:54.824371 client.go:119: Resolved TURN server turn.netbird.io:443 to 198.51.100.2:443
turnc DEBUG: 15:52:54.920397 udp_conn.go:66: Initial lifetime: 600 seconds
turnc DEBUG: 15:52:54.920441 udp_conn.go:81: Started refresh allocation timer
turnc DEBUG: 15:52:54.920449 udp_conn.go:84: Started refresh permission timer
ice WARNING: 2024/09/01 15:52:54 Failed to ping without candidate pairs. Connection is not possible yet.
turnc DEBUG: 15:52:55.148054 udp_conn.go:438: Channel binding successful: 192.168.1.12:51820 16384
ice WARNING: 2024/09/01 15:52:55 Discard message from (udp4 srflx 198.51.100.0:1166 related 0.0.0.0:51820), unknown TransactionID 0x97e036d84f3751cbc5d7d998
ice WARNING: 2024/09/01 15:52:55 Discard message from (udp4 srflx 198.51.100.0:1166 related 0.0.0.0:51820), unknown TransactionID 0x97e036d84f3751cbc5d7d998
ice WARNING: 2024/09/01 15:52:55 Discard message from (udp4 srflx 198.51.100.0:1166 related 0.0.0.0:51820), unknown TransactionID 0x1c167f85d30bd845d3ab9cc6
ice WARNING: 2024/09/01 15:52:55 Discard message from (udp4 srflx 198.51.100.0:1166 related 0.0.0.0:51820), unknown TransactionID 0x97e036d84f3751cbc5d7d998
ice WARNING: 2024/09/01 15:52:55 Discard message from (udp4 srflx 198.51.100.0:1166 related 0.0.0.0:51820), unknown TransactionID 0x1c167f85d30bd845d3ab9cc6
ice WARNING: 2024/09/01 15:52:55 Discard message from (udp4 srflx 198.51.100.0:1166 related 0.0.0.0:51820), unknown TransactionID 0x1c167f85d30bd845d3ab9cc6
ice WARNING: 2024/09/01 15:52:55 Discard message from (udp4 srflx 198.51.100.0:1166 related 0.0.0.0:51820), unknown TransactionID 0xce746e51cc630dd404b543eb
ice WARNING: 2024/09/01 15:52:55 Discard message from (udp4 srflx 198.51.100.0:1166 related 0.0.0.0:51820), unknown TransactionID 0xce746e51cc630dd404b543eb
ice WARNING: 2024/09/01 15:52:55 Discard message from (udp4 srflx 198.51.100.0:1166 related 0.0.0.0:51820), unknown TransactionID 0xce746e51cc630dd404b543eb
turnc DEBUG: 15:52:55.218448 udp_conn.go:438: Channel binding successful: 198.51.100.0:1166 16385
turnc DEBUG: 15:52:55.220838 udp_conn.go:438: Channel binding successful: 198.51.100.0:51820 16386
ice INFO: 2024/09/01 15:52:55 Setting new connection state: Connected
ice WARNING: 2024/09/01 15:52:55 Discard message from (udp4 srflx 198.51.100.0:1166 related 0.0.0.0:51820), unknown TransactionID 0xc1d737bf84b33e7ecd779602
ice WARNING: 2024/09/01 15:52:55 Discard message from (udp4 srflx 198.51.100.0:1166 related 0.0.0.0:51820), unknown TransactionID 0xc1d737bf84b33e7ecd779602
ice WARNING: 2024/09/01 15:52:55 Discard message from (udp4 srflx 198.51.100.0:1166 related 0.0.0.0:51820), unknown TransactionID 0xc1d737bf84b33e7ecd779602
ice WARNING: 2024/09/01 15:52:58 Discard message from (udp4 srflx 198.51.100.0:1167 related 0.0.0.0:51820), unknown TransactionID 0xb9f786fe60d29c4534568aa0
ice WARNING: 2024/09/01 15:52:58 Discard message from (udp4 srflx 198.51.100.0:1167 related 0.0.0.0:51820), unknown TransactionID 0xb9f786fe60d29c4534568aa0
ice WARNING: 2024/09/01 15:52:58 Discard message from (udp4 srflx 198.51.100.0:1167 related 0.0.0.0:51820), unknown TransactionID 0xb9f786fe60d29c4534568aa0
ice WARNING: 2024/09/01 15:52:59 Discard message from (udp4 srflx 198.51.100.0:1166 related 0.0.0.0:51820), unknown TransactionID 0x0ba4fff7c3c559d7c8178121
ice WARNING: 2024/09/01 15:52:59 Discard message from (udp4 srflx 198.51.100.0:1166 related 0.0.0.0:51820), unknown TransactionID 0x0ba4fff7c3c559d7c8178121
ice WARNING: 2024/09/01 15:52:59 Discard message from (udp4 srflx 198.51.100.0:1166 related 0.0.0.0:51820), unknown TransactionID 0x0ba4fff7c3c559d7c8178121
ice WARNING: 2024/09/01 15:53:02 Discard message from (udp4 srflx 198.51.100.0:1167 related 0.0.0.0:51820), unknown TransactionID 0x9f114461a655ea4dffaeffda
ice WARNING: 2024/09/01 15:53:02 Discard message from (udp4 srflx 198.51.100.0:1167 related 0.0.0.0:51820), unknown TransactionID 0x9f114461a655ea4dffaeffda
```
Describe the problem
Previously working direct peer connections sometimes fall back to being relayed, possibly after waking from sleep. When it happens,
netbird down
thennetbird up
is enough to get the peer connections to be direct again.To Reproduce
While I can't reliably reproduce it yet, logging seems to show it happening when waking from sleep. There may be a minimum sleep period before it happens, and it might not happen on every wake.
Expected behavior
Direct peer connections should be reestablished.
Are you using NetBird Cloud?
Yes
NetBird version
0.28.9
on Debian BookwormNetBird status -dA output: Note the two relayed peers. After bringing netbird down and up again they go back to being direct.
Screenshots
n/a
Additional context
I'm trying to diagnose an issue where peer connections are made via a relay server when a direct connection should be possible using stun. Running
netbird down
thennetbird up
seems to be enough to get direct peer connections established again when it happens.Netbird is running inside a Debian Bookworm VM, and I think the issues occurs when the host machine wakes from sleep and peer connections need reestablishing.
I noticed that there doesn't appear to be any mention of
stun.netbird.io
in the debug log when one of these relayed connections is made, almost as if the stun server isn't being used when it happens. For example a successful direct connection shows something like:Whereas there's no mention of
stun.netbird.io
when a connection between the same peers ends up being relayed:I've attached a sanitized debug client.log file that hopefully show whats happening. The two relevant remote peers are
uMGErvJAyflc7PkSVttAXkZB5Ga09qFKJ4ThQTL
anduMGErvJAyflc7PkSVttAXkZB5Ga09qFKJ4ThQTL+5VE=
. The previously quoted timestamps should hopefully show the difference between a successful (15:52:55
) direct connection, and a relayed one (15:37:09
) being used instead to the same peer.The relay log entry at
15:37
above was after waking from sleep, and the direct connection at15:52
happened after runningnetbird down
thennetbird up
.I've also got a
netbird.out
debug logfile with thePIONS_LOG_DEBUG=all
env var set that probably shows some useful info, though haven't attached it yet as its unsanitized. I'll try to snip out and sanitize the relevant time periods and will attach them later below, but briefly of note is the following when the15:37
post-wake connection attempt happened:There are also a couple of dozen more 'Failed to send packet' errors such as the ones below that only appear on that connection attempt, and not on the successful direct one. The errors appear nowhere else in the logfile apart from connection attempts immediately after the system wakes from sleep:
It's almost as if access to the stun server or network connectivity is initially unavailable on waking, causing the relay to be used instead.
I don't think there's any otherwise noteworthy networking stuff going on between peers. They're on devices behind standard residential internet connections without any CGNAT. When it works, stun seems to be enough to get direct connections working (at least until the machine goes to sleep).
From what I can make out by reading the log files, Netbird doesn't try reconnecting once the relayed connection is established. I'm not sure if this is currently a thing, but perhaps occasionally trying to 'upgrade' to a direct connection from a relayed one might be worth a feature request in future. While it might not be the cause of this issue, it could possibly work around it.