Open hseshadr opened 1 year ago
@hseshadr is ICMP enabled on AWS? This is required for health checks to complete as normal.
Bump this.
Same problem.
Server (gate1) installed on Digital Ocean. Two nodes: home (nad9) and office (zenbook). Netclient installed on all three nodes. All nodes are Ubuntu without any firewall.
Everything works perfectly when doing netclient register. Ping and SSH work perfectly between any two nodes. After a while (minutes or hours), everything stops working. Neither ping nor SSH work between any pair of machines.
In dashboard everything is green and healthy.
However, I have manually configured another connection (wg0) with Wireguard that has been working without problems for months.
This is very annoying and frustrating because I come home expecting to be able to access the office and the connection is impossible.
Output of wg show on the server:
root@gate1:~# wg show
interface: wg0
public key: xxxxxxxxxF77quZwe72qx02awvtla0H9TdMhhnv5+00=
private key: (hidden)
listening port: 51820
peer: xxxxxxxxxsuCdmraF71fSHC3kfjAa3Gba85Zicr4PVo=
endpoint: xxx.xxx.xxx.xxx:34153
allowed ips: 10.1.0.51/32
latest handshake: 20 seconds ago
transfer: 37.36 MiB received, 10.11 MiB sent
... 5 more peers manually configured
interface: netmaker
public key: xxxxxxxxx+Zo8HkIwgDqg6pErzoMEjWAlpQ3jMJwawA=
private key: (hidden)
listening port: 51821
peer: xxxxxxxxxhFOEuxLKP9+wJdS+Xn8gapYkDhFGImQF1Q=
endpoint: xxx.xxx.xxx.xxx:5353
allowed ips: 10.11.12.1/32
latest handshake: 21 seconds ago
transfer: 24.55 KiB received, 6.29 KiB sent
persistent keepalive: every 20 seconds
peer: xxxxxxxxxElX3QTMloWhi/LiPyqlOWt5vxoMxlwn2lc=
endpoint: 127.0.0.1:37695
allowed ips: 10.11.12.2/32
transfer: 202.49 KiB received, 127.89 KiB sent
persistent keepalive: every 20 seconds
root@gate1:~# cat /etc/hosts | grep netmaker
10.11.12.3 gate1.konstack #netmaker
10.11.12.1 nad9.konstack #netmaker
10.11.12.2 zenbook.konstack #netmaker
Output of wg show from office (zenbook):
zenbook|~|⇒ sudo wg show
interface: wg0
public key: xxxxxxxxx0VqJrAL9pJNll/gFY/ZDDWJrj4Il5UU+RQ=
private key: (hidden)
listening port: 55646
peer: xxxxxxxxxF77quZwe72qx02awvtla0H9TdMhhnv5+00=
endpoint: 206.xxx.xxx.xxx:51820
allowed ips: 10.1.0.0/24, xxx.xxx.70.190/32, xxx.xxx.141.210/32, xxx.xxx.223.60/32, xxx.xxx.223.62/32
latest handshake: 1 minute, 58 seconds ago
transfer: 138.31 KiB received, 356.04 KiB sent
persistent keepalive: every 25 seconds
interface: netmaker
public key: xxxxxxxxxElX3QTMloWhi/LiPyqlOWt5vxoMxlwn2lc=
private key: (hidden)
listening port: 51821
peer: xxxxxxxxxhFOEuxLKP9+wJdS+Xn8gapYkDhFGImQF1Q=
endpoint: 127.0.0.1:38254
allowed ips: 10.11.12.1/32
latest handshake: 14 hours, 41 minutes, 5 seconds ago
transfer: 31.06 KiB received, 1.44 MiB sent
persistent keepalive: every 20 seconds
peer: xxxxxxxxx+Zo8HkIwgDqg6pErzoMEjWAlpQ3jMJwawA=
endpoint: 127.0.0.1:46629
allowed ips: 10.11.12.3/32
latest handshake: 14 hours, 41 minutes, 23 seconds ago
transfer: 216 B received, 1.26 MiB sent
persistent keepalive: every 20 seconds
zenbook|~|⇒ cat /etc/hosts
10.11.12.3 gate1.konstack #netmaker
zenbook|~|⇒ tracepath gate1.konstack
1?: [LOCALHOST] pmtu 1420
1: no reply
However, with the manually configured network (wg0), I have no problem:
zenbook|~|⇒ ssh root@10.1.0.1
Welcome to Ubuntu 18.04.6 LTS (GNU/Linux 4.15.0-159-generic x86_64)
+1 here. Leaving an active ping between two nodes looses connection after a min or two.
My workaround:
sudo crontab -e
... append to last line ...
*/1 * * * * netclient join -A
I've also tried changing the MTU of the cni0
network to 1500 (default for other interfaces) and 1280 (the minimum suggested on the troubleshooting guide)
sudo ifconfig cni0 mtu 1500
netclient join -A
...or...
sudo ifconfig cni0 mtu 1280 # this works much better though
netclient join -A
+1 here. Leaving an active ping between two nodes looses connection after a min or two.
My workaround:
sudo crontab -e ... append to last line ... */1 * * * * netclient join -A
I've also tried changing the MTU of the
cni0
network to 1500 (default for other interfaces) and 1280 (the minimum suggested on the troubleshooting guide)sudo ifconfig cni0 mtu 1500 netclient join -A ...or... sudo ifconfig cni0 mtu 1280 netclient join -A
On which OS are these nodes running? can you paste logs some here? journalctl -fu netclient
sure, logs from journalctl
Feb 23 20:42:05 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:42:05 adding addresses to netmaker interface
Feb 23 20:42:08 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:42:08 determined new endpoint for peer 7Fde7avsXsQIAxEIKlW4dfO+zhgo2FdoaMLBXABBhyc= - 10.42.0.0:51821
Feb 23 20:42:10 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:42:10 determined new endpoint for peer 7Fde7avsXsQIAxEIKlW4dfO+zhgo2FdoaMLBXABBhyc= - 10.42.0.0:51821
Feb 23 20:43:03 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:43:03 closed endpoint detection
Feb 23 20:43:03 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:43:03 checkin routine closed
Feb 23 20:43:04 bsas01 netclient[3391654]: completed pull for server 5b805622-879c-44da-9401-f887ba0fe9a9.app.prod.netmaker.io
Feb 23 20:43:05 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:43:05 adding addresses to netmaker interface
Feb 23 20:43:05 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:43:05 initialized endpoint detection on port 51821
Feb 23 20:43:09 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:43:09 determined new endpoint for peer 7Fde7avsXsQIAxEIKlW4dfO+zhgo2FdoaMLBXABBhyc= - 10.42.0.0:51821
Feb 23 20:43:10 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:43:10 determined new endpoint for peer 7Fde7avsXsQIAxEIKlW4dfO+zhgo2FdoaMLBXABBhyc= - 10.42.0.0:51821
Feb 23 20:44:02 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:44:02 closed endpoint detection
Feb 23 20:44:02 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:44:02 checkin routine closed
Feb 23 20:44:04 bsas01 netclient[3391654]: completed pull for server 5b805622-879c-44da-9401-f887ba0fe9a9.app.prod.netmaker.io
Feb 23 20:44:04 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:44:04 adding addresses to netmaker interface
Feb 23 20:44:04 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:44:04 initialized endpoint detection on port 51821
Feb 23 20:44:08 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:44:08 determined new endpoint for peer 7Fde7avsXsQIAxEIKlW4dfO+zhgo2FdoaMLBXABBhyc= - 10.42.0.0:51821
Feb 23 20:44:10 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:44:10 determined new endpoint for peer 7Fde7avsXsQIAxEIKlW4dfO+zhgo2FdoaMLBXABBhyc= - 10.42.0.0:51821
Linux distro:
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.3 LTS"
What's weird to me is that the connection get lost suddenly but I can't find any debug info anywhere. Like, the ping command dies but without throwing any timeout or anything, it just hangs.
Thanks for looking this through!!
sure, logs from journalctl
Feb 23 20:42:05 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:42:05 adding addresses to netmaker interface Feb 23 20:42:08 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:42:08 determined new endpoint for peer 7Fde7avsXsQIAxEIKlW4dfO+zhgo2FdoaMLBXABBhyc= - 10.42.0.0:51821 Feb 23 20:42:10 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:42:10 determined new endpoint for peer 7Fde7avsXsQIAxEIKlW4dfO+zhgo2FdoaMLBXABBhyc= - 10.42.0.0:51821 Feb 23 20:43:03 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:43:03 closed endpoint detection Feb 23 20:43:03 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:43:03 checkin routine closed Feb 23 20:43:04 bsas01 netclient[3391654]: completed pull for server 5b805622-879c-44da-9401-f887ba0fe9a9.app.prod.netmaker.io Feb 23 20:43:05 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:43:05 adding addresses to netmaker interface Feb 23 20:43:05 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:43:05 initialized endpoint detection on port 51821 Feb 23 20:43:09 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:43:09 determined new endpoint for peer 7Fde7avsXsQIAxEIKlW4dfO+zhgo2FdoaMLBXABBhyc= - 10.42.0.0:51821 Feb 23 20:43:10 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:43:10 determined new endpoint for peer 7Fde7avsXsQIAxEIKlW4dfO+zhgo2FdoaMLBXABBhyc= - 10.42.0.0:51821 Feb 23 20:44:02 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:44:02 closed endpoint detection Feb 23 20:44:02 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:44:02 checkin routine closed Feb 23 20:44:04 bsas01 netclient[3391654]: completed pull for server 5b805622-879c-44da-9401-f887ba0fe9a9.app.prod.netmaker.io Feb 23 20:44:04 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:44:04 adding addresses to netmaker interface Feb 23 20:44:04 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:44:04 initialized endpoint detection on port 51821 Feb 23 20:44:08 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:44:08 determined new endpoint for peer 7Fde7avsXsQIAxEIKlW4dfO+zhgo2FdoaMLBXABBhyc= - 10.42.0.0:51821 Feb 23 20:44:10 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:44:10 determined new endpoint for peer 7Fde7avsXsQIAxEIKlW4dfO+zhgo2FdoaMLBXABBhyc= - 10.42.0.0:51821
Linux distro:
DISTRIB_ID=Ubuntu DISTRIB_RELEASE=22.04 DISTRIB_CODENAME=jammy DISTRIB_DESCRIPTION="Ubuntu 22.04.3 LTS"
What's weird to me is that the connection get lost suddenly but I can't find any debug info anywhere. Like, the ping command dies but without throwing any timeout or anything, it just hangs.
Thanks for looking this through!!
can you increase verbosity for this host, to 3 on the UI, that way we can get more detailed logs
Interesting... these are the logs by using verbosity level 3:
Feb 27 22:52:09 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:52:09 adding addresses to netmaker interface
Feb 27 22:52:09 bsas01 netclient[4047997]: {"time":"2024-02-27T22:52:09.983285786Z","level":"INFO","source":"wireguard_linux.go 117}","msg":"adding address","address":"10.143.215.2","network":"10.143.215.0/24"}
Feb 27 22:52:10 bsas01 netclient[4047997]: {"time":"2024-02-27T22:52:10.20742628Z","level":"INFO","source":"daemon.go 328}","msg":"subscribing to host updates for","host":"803a4c30-6566-4065-91b5-8c085b74dc19","server":"5b805622-879c-44da-9401-f887ba0fe9a9.app.prod.netmaker.io"}
Feb 27 22:52:10 bsas01 netclient[4047997]: {"time":"2024-02-27T22:52:10.26468102Z","level":"INFO","source":"mqhandlers.go 114}","msg":"processing peer update for server","server":"5b805622-879c-44da-9401-f887ba0fe9a9.app.prod.netmaker.io"}
Feb 27 22:52:10 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:52:10 Deleting rules table: 5b805622-879c-44da-9401-f887ba0fe9a9.app.prod.netmaker.io egress
Feb 27 22:52:10 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:52:10 checkin with server(s)
Feb 27 22:52:10 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:52:10 Interface is a bridge network: docker0
Feb 27 22:52:10 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:52:10 Interface is a bridge network: cni0
Feb 27 22:52:12 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:52:12 determined new endpoint for peer 7Fde7avsXsQIAxEIKlW4dfO+zhgo2FdoaMLBXABBhyc= - 10.42.0.0:51821
Feb 27 22:52:14 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:52:14 determined new endpoint for peer 7Fde7avsXsQIAxEIKlW4dfO+zhgo2FdoaMLBXABBhyc= - 10.42.0.0:51821
Feb 27 22:52:56 bsas01 netclient[4047997]: {"time":"2024-02-27T22:52:56.37888809Z","level":"INFO","source":"mqhandlers.go 114}","msg":"processing peer update for server","server":"5b805622-879c-44da-9401-f887ba0fe9a9.app.prod.netmaker.io"}
Feb 27 22:52:56 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:52:56 Deleting rules table: 5b805622-879c-44da-9401-f887ba0fe9a9.app.prod.netmaker.io egress
Feb 27 22:53:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:53:08 checkin with server(s)
Feb 27 22:53:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:53:08 Interface is a bridge network: docker0
Feb 27 22:53:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:53:08 Interface is a bridge network: cni0
Feb 27 22:54:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:54:08 checkin with server(s)
Feb 27 22:54:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:54:08 Interface is a bridge network: docker0
Feb 27 22:54:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:54:08 Interface is a bridge network: cni0
Feb 27 22:55:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:55:08 checkin with server(s)
Feb 27 22:55:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:55:08 Interface is a bridge network: docker0
Feb 27 22:55:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:55:08 Interface is a bridge network: cni0
Feb 27 22:56:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:56:08 checkin with server(s)
Feb 27 22:56:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:56:08 Interface is a bridge network: docker0
Feb 27 22:56:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:56:08 Interface is a bridge network: cni0
Feb 27 22:57:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:57:08 checkin with server(s)Feb 27 22:57:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:57:08 Interface is a bridge network: docker0
Feb 27 22:57:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:57:08 Interface is a bridge network: cni0
Feb 27 22:57:56 bsas01 netclient[4047997]: {"time":"2024-02-27T22:57:56.445461144Z","level":"INFO","source":"mqhandlers.go 114}","msg":"processing peer update for server","server":"5b805622-879c-44da-9401-f887ba0fe9a9.app.prod.netmaker.io"}Feb 27 22:57:56 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:57:56 Deleting rules table: 5b805622-879c-44da-9401-f887ba0fe9a9.app.prod.netmaker.io egress
Feb 27 22:58:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:58:08 checkin with server(s)
Feb 27 22:58:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:58:08 Interface is a bridge network: docker0
Feb 27 22:58:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:58:08 Interface is a bridge network: cni0
While actively monitoring the logs between two hosts (one in the US under DO and a bare metal one on Argentina, latency around 170) the ping is stable up to this line:
Feb 27 22:57:56 bsas01 netclient[4047997]: {"time":"2024-02-27T22:57:56.445461144Z","level":"INFO","source":"mqhandlers.go 114}","msg":"processing peer update for server","server":"5b805622-879c-44da-9401-f887ba0fe9a9.app.prod.netmaker.io"}
Feb 27 22:57:56 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:57:56 Deleting rules table: 5b805622-879c-44da-9401-f887ba0fe9a9.app.prod.netmaker.io egress
Just after that line, the ping hangs without any error or anything, just hangs. I do see a lot of these:
checkin with server(s)
Interface is a bridge network: docker0
Interface is a bridge network: cni0
But the ping is still hanged and the network between both hosts lost.
Also, this is the ping after re-joining when it un-hangs automatically but with weird latency metrics:
PING 10.143.215.1 (10.143.215.1) 56(84) bytes of data.
64 bytes from 10.143.215.1: icmp_seq=5 ttl=64 time=4952 ms
64 bytes from 10.143.215.1: icmp_seq=6 ttl=64 time=3925 ms
64 bytes from 10.143.215.1: icmp_seq=7 ttl=64 time=2901 ms
64 bytes from 10.143.215.1: icmp_seq=8 ttl=64 time=1881 ms
64 bytes from 10.143.215.1: icmp_seq=9 ttl=64 time=856 ms
64 bytes from 10.143.215.1: icmp_seq=10 ttl=64 time=172 ms
64 bytes from 10.143.215.1: icmp_seq=11 ttl=64 time=191 ms
64 bytes from 10.143.215.1: icmp_seq=12 ttl=64 time=174 ms
64 bytes from 10.143.215.1: icmp_seq=13 ttl=64 time=173 ms
64 bytes from 10.143.215.1: icmp_seq=14 ttl=64 time=172 ms
64 bytes from 10.143.215.1: icmp_seq=15 ttl=64 time=173 ms
64 bytes from 10.143.215.1: icmp_seq=16 ttl=64 time=172 ms
64 bytes from 10.143.215.1: icmp_seq=17 ttl=64 time=171 ms
Interesting... these are the logs by using verbosity level 3:
Feb 27 22:52:09 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:52:09 adding addresses to netmaker interface Feb 27 22:52:09 bsas01 netclient[4047997]: {"time":"2024-02-27T22:52:09.983285786Z","level":"INFO","source":"wireguard_linux.go 117}","msg":"adding address","address":"10.143.215.2","network":"10.143.215.0/24"} Feb 27 22:52:10 bsas01 netclient[4047997]: {"time":"2024-02-27T22:52:10.20742628Z","level":"INFO","source":"daemon.go 328}","msg":"subscribing to host updates for","host":"803a4c30-6566-4065-91b5-8c085b74dc19","server":"5b805622-879c-44da-9401-f887ba0fe9a9.app.prod.netmaker.io"} Feb 27 22:52:10 bsas01 netclient[4047997]: {"time":"2024-02-27T22:52:10.26468102Z","level":"INFO","source":"mqhandlers.go 114}","msg":"processing peer update for server","server":"5b805622-879c-44da-9401-f887ba0fe9a9.app.prod.netmaker.io"} Feb 27 22:52:10 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:52:10 Deleting rules table: 5b805622-879c-44da-9401-f887ba0fe9a9.app.prod.netmaker.io egress Feb 27 22:52:10 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:52:10 checkin with server(s) Feb 27 22:52:10 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:52:10 Interface is a bridge network: docker0 Feb 27 22:52:10 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:52:10 Interface is a bridge network: cni0 Feb 27 22:52:12 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:52:12 determined new endpoint for peer 7Fde7avsXsQIAxEIKlW4dfO+zhgo2FdoaMLBXABBhyc= - 10.42.0.0:51821 Feb 27 22:52:14 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:52:14 determined new endpoint for peer 7Fde7avsXsQIAxEIKlW4dfO+zhgo2FdoaMLBXABBhyc= - 10.42.0.0:51821 Feb 27 22:52:56 bsas01 netclient[4047997]: {"time":"2024-02-27T22:52:56.37888809Z","level":"INFO","source":"mqhandlers.go 114}","msg":"processing peer update for server","server":"5b805622-879c-44da-9401-f887ba0fe9a9.app.prod.netmaker.io"} Feb 27 22:52:56 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:52:56 Deleting rules table: 5b805622-879c-44da-9401-f887ba0fe9a9.app.prod.netmaker.io egress Feb 27 22:53:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:53:08 checkin with server(s) Feb 27 22:53:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:53:08 Interface is a bridge network: docker0 Feb 27 22:53:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:53:08 Interface is a bridge network: cni0 Feb 27 22:54:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:54:08 checkin with server(s) Feb 27 22:54:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:54:08 Interface is a bridge network: docker0 Feb 27 22:54:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:54:08 Interface is a bridge network: cni0 Feb 27 22:55:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:55:08 checkin with server(s) Feb 27 22:55:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:55:08 Interface is a bridge network: docker0 Feb 27 22:55:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:55:08 Interface is a bridge network: cni0 Feb 27 22:56:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:56:08 checkin with server(s) Feb 27 22:56:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:56:08 Interface is a bridge network: docker0 Feb 27 22:56:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:56:08 Interface is a bridge network: cni0 Feb 27 22:57:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:57:08 checkin with server(s)Feb 27 22:57:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:57:08 Interface is a bridge network: docker0 Feb 27 22:57:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:57:08 Interface is a bridge network: cni0 Feb 27 22:57:56 bsas01 netclient[4047997]: {"time":"2024-02-27T22:57:56.445461144Z","level":"INFO","source":"mqhandlers.go 114}","msg":"processing peer update for server","server":"5b805622-879c-44da-9401-f887ba0fe9a9.app.prod.netmaker.io"}Feb 27 22:57:56 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:57:56 Deleting rules table: 5b805622-879c-44da-9401-f887ba0fe9a9.app.prod.netmaker.io egress Feb 27 22:58:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:58:08 checkin with server(s) Feb 27 22:58:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:58:08 Interface is a bridge network: docker0 Feb 27 22:58:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:58:08 Interface is a bridge network: cni0
While actively monitoring the logs between two hosts (one in the US under DO and a bare metal one on Argentina, latency around 170) the ping is stable up to this line:
Feb 27 22:57:56 bsas01 netclient[4047997]: {"time":"2024-02-27T22:57:56.445461144Z","level":"INFO","source":"mqhandlers.go 114}","msg":"processing peer update for server","server":"5b805622-879c-44da-9401-f887ba0fe9a9.app.prod.netmaker.io"} Feb 27 22:57:56 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:57:56 Deleting rules table: 5b805622-879c-44da-9401-f887ba0fe9a9.app.prod.netmaker.io egress
Just after that line, the ping hangs without any error or anything, just hangs. I do see a lot of these:
checkin with server(s) Interface is a bridge network: docker0 Interface is a bridge network: cni0
But the ping is still hanged and the network between both hosts lost.
Also, this is the ping after re-joining when it un-hangs automatically but with weird latency metrics:
PING 10.143.215.1 (10.143.215.1) 56(84) bytes of data. 64 bytes from 10.143.215.1: icmp_seq=5 ttl=64 time=4952 ms 64 bytes from 10.143.215.1: icmp_seq=6 ttl=64 time=3925 ms 64 bytes from 10.143.215.1: icmp_seq=7 ttl=64 time=2901 ms 64 bytes from 10.143.215.1: icmp_seq=8 ttl=64 time=1881 ms 64 bytes from 10.143.215.1: icmp_seq=9 ttl=64 time=856 ms 64 bytes from 10.143.215.1: icmp_seq=10 ttl=64 time=172 ms 64 bytes from 10.143.215.1: icmp_seq=11 ttl=64 time=191 ms 64 bytes from 10.143.215.1: icmp_seq=12 ttl=64 time=174 ms 64 bytes from 10.143.215.1: icmp_seq=13 ttl=64 time=173 ms 64 bytes from 10.143.215.1: icmp_seq=14 ttl=64 time=172 ms 64 bytes from 10.143.215.1: icmp_seq=15 ttl=64 time=173 ms 64 bytes from 10.143.215.1: icmp_seq=16 ttl=64 time=172 ms 64 bytes from 10.143.215.1: icmp_seq=17 ttl=64 time=171 ms
which version of netmaker are you using? on the interface for this peer you are pinging, what is the endpoint you see, is it a public or private IP?
I'm using the latest version of netclient suggested from the UI: v0.22.0. I'm doing ping between both Private Addresses (IPv4) also taken from the UI. I'm not on a self hosted netmaker server but the cloud/saas one.
I created a netmaker server in AWS using lightsail. Followed the instructions: https://itnext.io/how-to-deploy-a-wireguard-vpn-for-aws-remote-access-with-netmaker-a3b8d0f59af2
I deployed netmaker server 0.16.3 I then created two instances on each respective home laptop in my home network using netclient. Everything worked fine as I was able to ping every instance with the wireguard assigned IP addresses (e.g node1 - 10.1.20.1, node2 - 10.1.20.2). However, some a few minutes pass and then the dashboard shows the nodes as orange (warning: node has connectivity issues) eventually this turns red (error), the error is never really shown and wg show works as expected. I have checked my network several times and everything is ok with it. I noticed issue #141 is something very similar. I never saw the resolution to the problem. Netmaker seems promising and the best solution for full VPN meshes. I am evaluating other competitive systems like lightsail and nebula also. It would be great if netmaker would work for this simple case.