Open Cyberes opened 2 months ago
Also worth noting that this issue doesn't occur after a normal reboot.
On the surface this looks like a nebula IP reuse issue but there's so much redacted that it's hard to understand exactly what is happening.
Apr 22 12:13:31 cloud-proxy nebula[689]: time="2024-04-22T12:13:31-00:00" level=debug msg="Recv error sent" index=0 udpAddr="<host's external IP>:5694"
<host's external IP>
ever change or is it always the same?index=0
?Both Host A and Host B appear to be attempting to initiate a tunnel with the other but neither logs show a matching handshake despite Host A responding. This means we should see logs on Host A responding to handshakes initiated by Host B.
To help can you do the following:
Great, I've sent you an email.
And there aren't multiple processes running. It's controlled by systemd.
I was wondering if you found anything noteworthy in the logs? This isn't a big problem but I did want to bring this problem to your attention if it was worth looking at.
What version of
nebula
are you using? (nebula -version
)1.8.2
What operating system are you using?
Linux
Describe the Bug
One of my servers has been unstable and will sometimes become unresponsive, requiring a hard reset (this is host A,
172.0.3.109
). When it comes back up, it's unable to establish a connection to my node that is a nginx reverse proxy (this is host B,172.0.2.120
). If I manually restart the Nebula service on host A, it fixes the issue and host A can connect to host B.Logs from affected hosts
Lighthouse
``` Apr 22 12:13:31 cloud-proxy nebula[689]: time="2024-04-22T12:13:31-00:00" level=debug msg="Recv error sent" index=0 udpAddr="Host A
``` Apr 22 12:12:25 host-a nebula[811]: time="2024-04-22T12:12:25-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120 Apr 22 12:12:26 host-a nebula[811]: time="2024-04-22T12:12:26-00:00" level=info msg="Tunnel status" certName=host-b localIndex=804429705 remoteIndex=3094684430 tunnelCheck="map[method:active state:dead]" v> Apr 22 12:12:26 host-a nebula[811]: time="2024-04-22T12:12:26-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...> Apr 22 12:12:26 host-a nebula[811]: time="2024-04-22T12:12:26-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120 Apr 22 12:12:26 host-a nebula[811]: time="2024-04-22T12:12:26-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...> Apr 22 12:12:26 host-a nebula[811]: time="2024-04-22T12:12:26-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120 Apr 22 12:12:27 host-a nebula[811]: time="2024-04-22T12:12:27-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...> Apr 22 12:12:27 host-a nebula[811]: time="2024-04-22T12:12:27-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120 Apr 22 12:12:27 host-a nebula[811]: time="2024-04-22T12:12:27-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...> Apr 22 12:12:27 host-a nebula[811]: time="2024-04-22T12:12:27-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120 Apr 22 12:12:28 host-a nebula[811]: time="2024-04-22T12:12:28-00:00" level=info msg="sendNoMetrics failed to find HostInfo" certName=host-b error="unable to find host with relay" localIndex=2480024129 rela> Apr 22 12:12:28 host-a nebula[811]: time="2024-04-22T12:12:28-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...> Apr 22 12:12:28 host-a nebula[811]: time="2024-04-22T12:12:28-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120 Apr 22 12:12:28 host-a nebula[811]: time="2024-04-22T12:12:28-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...> Apr 22 12:12:28 host-a nebula[811]: time="2024-04-22T12:12:28-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120 Apr 22 12:12:31 host-a nebula[811]: time="2024-04-22T12:12:31-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...> Apr 22 12:12:31 host-a nebula[811]: time="2024-04-22T12:12:31-00:00" level=info msg="Taking new handshake" certName=host-b localIndex=2480024129 remoteIndex=108412485 vpnIp=172.0.2.120 Apr 22 12:12:31 host-a nebula[811]: time="2024-04-22T12:12:31-00:00" level=info msg="Handshake message sent" certName=host-b fingerprint=...> Apr 22 12:12:31 host-a nebula[811]: time="2024-04-22T12:12:31-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...> Apr 22 12:12:31 host-a nebula[811]: time="2024-04-22T12:12:31-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120 Apr 22 12:12:31 host-a nebula[811]: time="2024-04-22T12:12:31-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...> Apr 22 12:12:31 host-a nebula[811]: time="2024-04-22T12:12:31-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120 Apr 22 12:12:31 host-a nebula[811]: time="2024-04-22T12:12:31-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...> Apr 22 12:12:31 host-a nebula[811]: time="2024-04-22T12:12:31-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120 Apr 22 12:12:31 host-a nebula[811]: time="2024-04-22T12:12:31-00:00" level=info msg="sendNoMetrics failed to find HostInfo" certName=host-b error="unable to find host with relay" localIndex=817659693 relay> Apr 22 12:12:31 host-a nebula[811]: time="2024-04-22T12:12:31-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...> Apr 22 12:10:23 host-a nebula[811]: time="2024-04-22T12:10:23-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120 Apr 22 12:10:23 host-a nebula[811]: time="2024-04-22T12:10:23-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...> Apr 22 12:10:23 host-a nebula[811]: time="2024-04-22T12:10:23-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120 Apr 22 12:10:24 host-a nebula[811]: time="2024-04-22T12:10:24-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...> Apr 22 12:10:24 host-a nebula[811]: time="2024-04-22T12:10:24-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120 Apr 22 12:10:24 host-a nebula[811]: time="2024-04-22T12:10:24-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...> Apr 22 12:10:24 host-a nebula[811]: time="2024-04-22T12:10:24-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120 Apr 22 12:10:24 host-a nebula[811]: time="2024-04-22T12:10:24-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...> Apr 22 12:10:24 host-a nebula[811]: time="2024-04-22T12:10:24-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120 Apr 22 12:10:24 host-a nebula[811]: time="2024-04-22T12:10:24-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...> Apr 22 12:10:24 host-a nebula[811]: time="2024-04-22T12:10:24-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120 Apr 22 12:10:25 host-a nebula[811]: time="2024-04-22T12:10:25-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...> Apr 22 12:10:25 host-a nebula[811]: time="2024-04-22T12:10:25-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120 Apr 22 12:10:25 host-a nebula[811]: time="2024-04-22T12:10:25-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...> Apr 22 12:10:25 host-a nebula[811]: time="2024-04-22T12:10:25-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120 Apr 22 12:10:25 host-a nebula[811]: time="2024-04-22T12:10:25-00:00" level=info msg="Tunnel status" certName=host-b localIndex=1362300847 remoteIndex=2069131475 tunnelCheck="map[method:active state:dead]" > Apr 22 12:10:25 host-a nebula[811]: time="2024-04-22T12:10:25-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...> Apr 22 12:10:25 host-a nebula[811]: time="2024-04-22T12:10:25-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120 Apr 22 12:10:25 host-a nebula[811]: time="2024-04-22T12:10:25-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...> Apr 22 12:10:25 host-a nebula[811]: time="2024-04-22T12:10:25-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120 Apr 22 12:10:26 host-a nebula[811]: time="2024-04-22T12:10:26-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...> Apr 22 12:10:26 host-a nebula[811]: time="2024-04-22T12:10:26-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120 Apr 22 12:10:26 host-a nebula[811]: time="2024-04-22T12:10:26-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...> Apr 22 12:10:26 host-a nebula[811]: time="2024-04-22T12:10:26-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120 Apr 22 12:10:27 host-a nebula[811]: time="2024-04-22T12:10:27-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...> Apr 22 12:10:27 host-a nebula[811]: time="2024-04-22T12:10:27-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120 Apr 22 12:10:27 host-a nebula[811]: time="2024-04-22T12:10:27-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...> Apr 22 12:10:27 host-a nebula[811]: time="2024-04-22T12:10:27-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120 Apr 22 12:10:28 host-a nebula[811]: time="2024-04-22T12:10:28-00:00" level=info msg="sendNoMetrics failed to find HostInfo" certName=host-b error="unable to find host with relay" localIndex=2903481744 rela> Apr 22 12:10:28 host-a nebula[811]: time="2024-04-22T12:10:28-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...> Apr 22 12:10:28 host-a nebula[811]: time="2024-04-22T12:10:28-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120 Apr 22 12:10:28 host-a nebula[811]: time="2024-04-22T12:10:28-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...> Apr 22 12:10:28 host-a nebula[811]: time="2024-04-22T12:10:28-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120 Apr 22 12:10:30 host-a nebula[811]: time="2024-04-22T12:10:30-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...> Apr 22 12:10:30 host-a nebula[811]: time="2024-04-22T12:10:30-00:00" level=info msg="Taking new handshake" certName=host-b localIndex=2903481744 remoteIndex=3824716788 vpnIp=172.0.2.120 Apr 22 12:10:30 host-a nebula[811]: time="2024-04-22T12:10:30-00:00" level=info msg="Handshake message sent" certName=host-b fingerprint=...> Apr 22 12:10:30 host-a nebula[811]: time="2024-04-22T12:10:30-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...> Apr 22 12:10:30 host-a nebula[811]: time="2024-04-22T12:10:30-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120 Apr 22 12:10:30 host-a nebula[811]: time="2024-04-22T12:10:30-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...> Apr 22 12:10:30 host-a nebula[811]: time="2024-04-22T12:10:30-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120 Apr 22 12:10:30 host-a nebula[811]: time="2024-04-22T12:10:30-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...> Apr 22 12:10:30 host-a nebula[811]: time="2024-04-22T12:10:30-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120 ``` This pattern repeats.Host B
``` Apr 22 12:06:25 host-b nebula[111]: time="2024-04-22T12:06:25-00:00" level=info msg="Attempt to relay through hosts" localIndex=1813799298 relays="[172.0.0.2 172.0.0.3 172.0.0.2 172.0.0.3]" remoteIndex=0 vpnIp=1> Apr 22 12:06:25 host-b nebula[111]: time="2024-04-22T12:06:25-00:00" level=info msg="Send handshake via relay" localIndex=1813799298 relay=172.0.0.2 remoteIndex=0 vpnIp=172.0.3.109 Apr 22 12:06:25 host-b nebula[111]: time="2024-04-22T12:06:25-00:00" level=info msg="Send handshake via relay" localIndex=1813799298 relay=172.0.0.3 remoteIndex=0 vpnIp=172.0.3.109 Apr 22 12:06:25 host-b nebula[111]: time="2024-04-22T12:06:25-00:00" level=info msg="Send handshake via relay" localIndex=1813799298 relay=172.0.0.2 remoteIndex=0 vpnIp=172.0.3.109 Apr 22 12:06:25 host-b nebula[111]: time="2024-04-22T12:06:25-00:00" level=info msg="Send handshake via relay" localIndex=1813799298 relay=172.0.0.3 remoteIndex=0 vpnIp=172.0.3.109 Apr 22 12:06:26 host-b nebula[111]: time="2024-04-22T12:06:26-00:00" level=info msg="Handshake timed out" durationNs=6152868332 handshake="map[stage:1 style:ix_psk0]" initiatorIndex=1813799298 localIndex=1813799> Apr 22 12:06:30 host-b nebula[111]: time="2024-04-22T12:06:30-00:00" level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=3495069400 localIndex=3495069400 remoteIndex=0 u> Apr 22 12:06:30 host-b nebula[111]: time="2024-04-22T12:06:30-00:00" level=info msg="Attempt to relay through hosts" localIndex=3495069400 relays="[172.0.0.2 172.0.0.3 172.0.0.2 172.0.0.3]" remoteIndex=0 vpnIp=1> Apr 22 12:06:30 host-b nebula[111]: time="2024-04-22T12:06:30-00:00" level=info msg="Send handshake via relay" localIndex=3495069400 relay=172.0.0.2 remoteIndex=0 vpnIp=172.0.3.109 Apr 22 12:06:30 host-b nebula[111]: time="2024-04-22T12:06:30-00:00" level=info msg="Send handshake via relay" localIndex=3495069400 relay=172.0.0.3 remoteIndex=0 vpnIp=172.0.3.109 ``` This pattern repeatsConfig files from affected hosts
Host A
```yaml firewall: conntrack: default_timeout: 10m max_connections: 100000 tcp_timeout: 12m udp_timeout: 3m inbound: - host: any port: any proto: icmp - host: any port: any proto: any outbound: - host: any port: any proto: any lighthouse: am_lighthouse: false hosts: - 172.0.0.2 - 172.0.0.3 interval: 60 listen: batch: 128 host: 0.0.0.0 port: 0 read_buffer: 10485760 write_buffer: 10485760 logging: format: text level: info pki: ca: | cert: | key: | punchy: delay: 1s punch: true punch_back: true respond: true relay: am_relay: false relays: - 172.0.0.2 - 172.0.0.3 use_relays: true static_host_map: 172.0.0.2: - ....:4242 172.0.0.3: - ....:4242 tun: dev: nebula1 disabled: false drop_local_broadcast: false drop_multicast: false routes: null tx_queue: 5000 unsafe_routes: null ```Host B
```yaml firewall: conntrack: default_timeout: 10m max_connections: 100000 tcp_timeout: 12m udp_timeout: 3m inbound: - host: any port: any proto: icmp - host: any port: any proto: any outbound: - host: any port: any proto: any lighthouse: am_lighthouse: false hosts: - 172.0.0.2 - 172.0.0.3 interval: 60 listen: batch: 128 host: 0.0.0.0 port: 0 read_buffer: 10485760 write_buffer: 10485760 logging: format: text level: info pki: ca: | cert: | key: | preferred_ranges: - 10.0.0.0/24 punchy: delay: 1s punch: true punch_back: true respond: true relay: am_relay: false relays: - 172.0.0.2 - 172.0.0.3 use_relays: true static_host_map: 172.0.0.2: - ....:4242 172.0.0.3: - ....:4242 tun: dev: nebula1 disabled: false drop_local_broadcast: false drop_multicast: false routes: null tx_queue: 5000 unsafe_routes: null ```Lighthouse
```yaml firewall: conntrack: default_timeout: 10m max_connections: 100000 tcp_timeout: 12m udp_timeout: 3m inbound: - host: any port: any proto: icmp - host: any port: any proto: any outbound: - host: any port: any proto: any lighthouse: am_lighthouse: true interval: 60 listen: batch: 128 host: 0.0.0.0 port: 4242 logging: format: text level: debug pki: ca: | cert: | key: | punchy: delay: 1s punch: true punch_back: true respond: true relay: am_relay: true use_relays: false stats: host: ...:2003 interval: 10s lighthouse_metrics: true message_metrics: true prefix: nebula.cloud_proxy protocol: tcp type: graphite tun: dev: nebula1 disabled: false drop_local_broadcast: false drop_multicast: false routes: null tx_queue: 5000 unsafe_routes: null ```