gravitl / netmaker

Netmaker makes networks with WireGuard. Netmaker automates fast, secure, and distributed virtual networks.
https://netmaker.io
Other
9.5k stars 552 forks source link

Simple Home Network Setup does not work after some time #1826

Open hseshadr opened 1 year ago

hseshadr commented 1 year ago

I created a netmaker server in AWS using lightsail. Followed the instructions: https://itnext.io/how-to-deploy-a-wireguard-vpn-for-aws-remote-access-with-netmaker-a3b8d0f59af2

I deployed netmaker server 0.16.3 I then created two instances on each respective home laptop in my home network using netclient. Everything worked fine as I was able to ping every instance with the wireguard assigned IP addresses (e.g node1 - 10.1.20.1, node2 - 10.1.20.2). However, some a few minutes pass and then the dashboard shows the nodes as orange (warning: node has connectivity issues) eventually this turns red (error), the error is never really shown and wg show works as expected. I have checked my network several times and everything is ok with it. I noticed issue #141 is something very similar. I never saw the resolution to the problem. Netmaker seems promising and the best solution for full VPN meshes. I am evaluating other competitive systems like lightsail and nebula also. It would be great if netmaker would work for this simple case.

afeiszli commented 1 year ago

@hseshadr is ICMP enabled on AWS? This is required for health checks to complete as normal.

manelio commented 1 year ago

Bump this.

Same problem.

Server (gate1) installed on Digital Ocean. Two nodes: home (nad9) and office (zenbook). Netclient installed on all three nodes. All nodes are Ubuntu without any firewall.

Everything works perfectly when doing netclient register. Ping and SSH work perfectly between any two nodes. After a while (minutes or hours), everything stops working. Neither ping nor SSH work between any pair of machines.

In dashboard everything is green and healthy.

image

However, I have manually configured another connection (wg0) with Wireguard that has been working without problems for months.

This is very annoying and frustrating because I come home expecting to be able to access the office and the connection is impossible.

Output of wg show on the server:

root@gate1:~# wg show
interface: wg0
  public key: xxxxxxxxxF77quZwe72qx02awvtla0H9TdMhhnv5+00=
  private key: (hidden)
  listening port: 51820

peer: xxxxxxxxxsuCdmraF71fSHC3kfjAa3Gba85Zicr4PVo=
  endpoint: xxx.xxx.xxx.xxx:34153
  allowed ips: 10.1.0.51/32
  latest handshake: 20 seconds ago
  transfer: 37.36 MiB received, 10.11 MiB sent

... 5 more peers manually configured

interface: netmaker
  public key: xxxxxxxxx+Zo8HkIwgDqg6pErzoMEjWAlpQ3jMJwawA=
  private key: (hidden)
  listening port: 51821

peer: xxxxxxxxxhFOEuxLKP9+wJdS+Xn8gapYkDhFGImQF1Q=
  endpoint: xxx.xxx.xxx.xxx:5353
  allowed ips: 10.11.12.1/32
  latest handshake: 21 seconds ago
  transfer: 24.55 KiB received, 6.29 KiB sent
  persistent keepalive: every 20 seconds

peer: xxxxxxxxxElX3QTMloWhi/LiPyqlOWt5vxoMxlwn2lc=
  endpoint: 127.0.0.1:37695
  allowed ips: 10.11.12.2/32
  transfer: 202.49 KiB received, 127.89 KiB sent
  persistent keepalive: every 20 seconds

root@gate1:~# cat /etc/hosts | grep netmaker
10.11.12.3       gate1.konstack #netmaker
10.11.12.1       nad9.konstack #netmaker
10.11.12.2       zenbook.konstack #netmaker

Output of wg show from office (zenbook):

zenbook|~|⇒ sudo wg show

interface: wg0
  public key: xxxxxxxxx0VqJrAL9pJNll/gFY/ZDDWJrj4Il5UU+RQ=
  private key: (hidden)
  listening port: 55646

peer: xxxxxxxxxF77quZwe72qx02awvtla0H9TdMhhnv5+00=
  endpoint: 206.xxx.xxx.xxx:51820
  allowed ips: 10.1.0.0/24, xxx.xxx.70.190/32, xxx.xxx.141.210/32, xxx.xxx.223.60/32, xxx.xxx.223.62/32
  latest handshake: 1 minute, 58 seconds ago
  transfer: 138.31 KiB received, 356.04 KiB sent
  persistent keepalive: every 25 seconds

interface: netmaker
  public key: xxxxxxxxxElX3QTMloWhi/LiPyqlOWt5vxoMxlwn2lc=
  private key: (hidden)
  listening port: 51821

peer: xxxxxxxxxhFOEuxLKP9+wJdS+Xn8gapYkDhFGImQF1Q=
  endpoint: 127.0.0.1:38254
  allowed ips: 10.11.12.1/32
  latest handshake: 14 hours, 41 minutes, 5 seconds ago
  transfer: 31.06 KiB received, 1.44 MiB sent
  persistent keepalive: every 20 seconds

peer: xxxxxxxxx+Zo8HkIwgDqg6pErzoMEjWAlpQ3jMJwawA=
  endpoint: 127.0.0.1:46629
  allowed ips: 10.11.12.3/32
  latest handshake: 14 hours, 41 minutes, 23 seconds ago
  transfer: 216 B received, 1.26 MiB sent
  persistent keepalive: every 20 seconds

zenbook|~|⇒ cat /etc/hosts
10.11.12.3       gate1.konstack #netmaker

zenbook|~|⇒ tracepath gate1.konstack
 1?: [LOCALHOST]                      pmtu 1420
 1:  no reply

However, with the manually configured network (wg0), I have no problem:

zenbook|~|⇒ ssh root@10.1.0.1
Welcome to Ubuntu 18.04.6 LTS (GNU/Linux 4.15.0-159-generic x86_64)
julpod commented 8 months ago

+1 here. Leaving an active ping between two nodes looses connection after a min or two.

My workaround:

sudo crontab -e 
... append to last line ...
*/1 * * * * netclient join -A

I've also tried changing the MTU of the cni0 network to 1500 (default for other interfaces) and 1280 (the minimum suggested on the troubleshooting guide)

sudo ifconfig cni0 mtu 1500
netclient join -A
...or...
sudo ifconfig cni0 mtu 1280 # this works much better though
netclient join -A
abhishek9686 commented 8 months ago

+1 here. Leaving an active ping between two nodes looses connection after a min or two.

My workaround:

sudo crontab -e 
... append to last line ...
*/1 * * * * netclient join -A

I've also tried changing the MTU of the cni0 network to 1500 (default for other interfaces) and 1280 (the minimum suggested on the troubleshooting guide)

sudo ifconfig cni0 mtu 1500
netclient join -A
...or...
sudo ifconfig cni0 mtu 1280
netclient join -A

On which OS are these nodes running? can you paste logs some here? journalctl -fu netclient

julpod commented 8 months ago

sure, logs from journalctl

Feb 23 20:42:05 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:42:05 adding addresses to netmaker interface
Feb 23 20:42:08 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:42:08 determined new endpoint for peer 7Fde7avsXsQIAxEIKlW4dfO+zhgo2FdoaMLBXABBhyc= - 10.42.0.0:51821
Feb 23 20:42:10 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:42:10 determined new endpoint for peer 7Fde7avsXsQIAxEIKlW4dfO+zhgo2FdoaMLBXABBhyc= - 10.42.0.0:51821
Feb 23 20:43:03 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:43:03 closed endpoint detection
Feb 23 20:43:03 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:43:03 checkin routine closed
Feb 23 20:43:04 bsas01 netclient[3391654]: completed pull for server 5b805622-879c-44da-9401-f887ba0fe9a9.app.prod.netmaker.io
Feb 23 20:43:05 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:43:05 adding addresses to netmaker interface
Feb 23 20:43:05 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:43:05 initialized endpoint detection on port 51821
Feb 23 20:43:09 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:43:09 determined new endpoint for peer 7Fde7avsXsQIAxEIKlW4dfO+zhgo2FdoaMLBXABBhyc= - 10.42.0.0:51821
Feb 23 20:43:10 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:43:10 determined new endpoint for peer 7Fde7avsXsQIAxEIKlW4dfO+zhgo2FdoaMLBXABBhyc= - 10.42.0.0:51821
Feb 23 20:44:02 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:44:02 closed endpoint detection
Feb 23 20:44:02 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:44:02 checkin routine closed
Feb 23 20:44:04 bsas01 netclient[3391654]: completed pull for server 5b805622-879c-44da-9401-f887ba0fe9a9.app.prod.netmaker.io
Feb 23 20:44:04 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:44:04 adding addresses to netmaker interface
Feb 23 20:44:04 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:44:04 initialized endpoint detection on port 51821
Feb 23 20:44:08 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:44:08 determined new endpoint for peer 7Fde7avsXsQIAxEIKlW4dfO+zhgo2FdoaMLBXABBhyc= - 10.42.0.0:51821
Feb 23 20:44:10 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:44:10 determined new endpoint for peer 7Fde7avsXsQIAxEIKlW4dfO+zhgo2FdoaMLBXABBhyc= - 10.42.0.0:51821

Linux distro:

DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.3 LTS"

What's weird to me is that the connection get lost suddenly but I can't find any debug info anywhere. Like, the ping command dies but without throwing any timeout or anything, it just hangs.

Thanks for looking this through!!

abhishek9686 commented 8 months ago

sure, logs from journalctl

Feb 23 20:42:05 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:42:05 adding addresses to netmaker interface
Feb 23 20:42:08 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:42:08 determined new endpoint for peer 7Fde7avsXsQIAxEIKlW4dfO+zhgo2FdoaMLBXABBhyc= - 10.42.0.0:51821
Feb 23 20:42:10 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:42:10 determined new endpoint for peer 7Fde7avsXsQIAxEIKlW4dfO+zhgo2FdoaMLBXABBhyc= - 10.42.0.0:51821
Feb 23 20:43:03 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:43:03 closed endpoint detection
Feb 23 20:43:03 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:43:03 checkin routine closed
Feb 23 20:43:04 bsas01 netclient[3391654]: completed pull for server 5b805622-879c-44da-9401-f887ba0fe9a9.app.prod.netmaker.io
Feb 23 20:43:05 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:43:05 adding addresses to netmaker interface
Feb 23 20:43:05 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:43:05 initialized endpoint detection on port 51821
Feb 23 20:43:09 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:43:09 determined new endpoint for peer 7Fde7avsXsQIAxEIKlW4dfO+zhgo2FdoaMLBXABBhyc= - 10.42.0.0:51821
Feb 23 20:43:10 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:43:10 determined new endpoint for peer 7Fde7avsXsQIAxEIKlW4dfO+zhgo2FdoaMLBXABBhyc= - 10.42.0.0:51821
Feb 23 20:44:02 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:44:02 closed endpoint detection
Feb 23 20:44:02 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:44:02 checkin routine closed
Feb 23 20:44:04 bsas01 netclient[3391654]: completed pull for server 5b805622-879c-44da-9401-f887ba0fe9a9.app.prod.netmaker.io
Feb 23 20:44:04 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:44:04 adding addresses to netmaker interface
Feb 23 20:44:04 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:44:04 initialized endpoint detection on port 51821
Feb 23 20:44:08 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:44:08 determined new endpoint for peer 7Fde7avsXsQIAxEIKlW4dfO+zhgo2FdoaMLBXABBhyc= - 10.42.0.0:51821
Feb 23 20:44:10 bsas01 netclient[3391654]: [netclient] 2024-02-23 20:44:10 determined new endpoint for peer 7Fde7avsXsQIAxEIKlW4dfO+zhgo2FdoaMLBXABBhyc= - 10.42.0.0:51821

Linux distro:

DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.3 LTS"

What's weird to me is that the connection get lost suddenly but I can't find any debug info anywhere. Like, the ping command dies but without throwing any timeout or anything, it just hangs.

Thanks for looking this through!!

can you increase verbosity for this host, to 3 on the UI, that way we can get more detailed logs

julpod commented 8 months ago

Interesting... these are the logs by using verbosity level 3:

Feb 27 22:52:09 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:52:09 adding addresses to netmaker interface
Feb 27 22:52:09 bsas01 netclient[4047997]: {"time":"2024-02-27T22:52:09.983285786Z","level":"INFO","source":"wireguard_linux.go 117}","msg":"adding address","address":"10.143.215.2","network":"10.143.215.0/24"}
Feb 27 22:52:10 bsas01 netclient[4047997]: {"time":"2024-02-27T22:52:10.20742628Z","level":"INFO","source":"daemon.go 328}","msg":"subscribing to host updates for","host":"803a4c30-6566-4065-91b5-8c085b74dc19","server":"5b805622-879c-44da-9401-f887ba0fe9a9.app.prod.netmaker.io"}
Feb 27 22:52:10 bsas01 netclient[4047997]: {"time":"2024-02-27T22:52:10.26468102Z","level":"INFO","source":"mqhandlers.go 114}","msg":"processing peer update for server","server":"5b805622-879c-44da-9401-f887ba0fe9a9.app.prod.netmaker.io"}
Feb 27 22:52:10 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:52:10 Deleting rules table:  5b805622-879c-44da-9401-f887ba0fe9a9.app.prod.netmaker.io egress
Feb 27 22:52:10 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:52:10 checkin with server(s)
Feb 27 22:52:10 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:52:10 Interface is a bridge network: docker0
Feb 27 22:52:10 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:52:10 Interface is a bridge network: cni0
Feb 27 22:52:12 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:52:12 determined new endpoint for peer 7Fde7avsXsQIAxEIKlW4dfO+zhgo2FdoaMLBXABBhyc= - 10.42.0.0:51821
Feb 27 22:52:14 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:52:14 determined new endpoint for peer 7Fde7avsXsQIAxEIKlW4dfO+zhgo2FdoaMLBXABBhyc= - 10.42.0.0:51821
Feb 27 22:52:56 bsas01 netclient[4047997]: {"time":"2024-02-27T22:52:56.37888809Z","level":"INFO","source":"mqhandlers.go 114}","msg":"processing peer update for server","server":"5b805622-879c-44da-9401-f887ba0fe9a9.app.prod.netmaker.io"}
Feb 27 22:52:56 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:52:56 Deleting rules table:  5b805622-879c-44da-9401-f887ba0fe9a9.app.prod.netmaker.io egress
Feb 27 22:53:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:53:08 checkin with server(s)
Feb 27 22:53:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:53:08 Interface is a bridge network: docker0
Feb 27 22:53:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:53:08 Interface is a bridge network: cni0
Feb 27 22:54:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:54:08 checkin with server(s)
Feb 27 22:54:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:54:08 Interface is a bridge network: docker0
Feb 27 22:54:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:54:08 Interface is a bridge network: cni0
Feb 27 22:55:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:55:08 checkin with server(s)
Feb 27 22:55:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:55:08 Interface is a bridge network: docker0
Feb 27 22:55:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:55:08 Interface is a bridge network: cni0
Feb 27 22:56:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:56:08 checkin with server(s)
Feb 27 22:56:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:56:08 Interface is a bridge network: docker0
Feb 27 22:56:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:56:08 Interface is a bridge network: cni0
Feb 27 22:57:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:57:08 checkin with server(s)Feb 27 22:57:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:57:08 Interface is a bridge network: docker0
Feb 27 22:57:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:57:08 Interface is a bridge network: cni0
Feb 27 22:57:56 bsas01 netclient[4047997]: {"time":"2024-02-27T22:57:56.445461144Z","level":"INFO","source":"mqhandlers.go 114}","msg":"processing peer update for server","server":"5b805622-879c-44da-9401-f887ba0fe9a9.app.prod.netmaker.io"}Feb 27 22:57:56 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:57:56 Deleting rules table:  5b805622-879c-44da-9401-f887ba0fe9a9.app.prod.netmaker.io egress
Feb 27 22:58:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:58:08 checkin with server(s)
Feb 27 22:58:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:58:08 Interface is a bridge network: docker0
Feb 27 22:58:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:58:08 Interface is a bridge network: cni0

While actively monitoring the logs between two hosts (one in the US under DO and a bare metal one on Argentina, latency around 170) the ping is stable up to this line:

Feb 27 22:57:56 bsas01 netclient[4047997]: {"time":"2024-02-27T22:57:56.445461144Z","level":"INFO","source":"mqhandlers.go 114}","msg":"processing peer update for server","server":"5b805622-879c-44da-9401-f887ba0fe9a9.app.prod.netmaker.io"}
Feb 27 22:57:56 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:57:56 Deleting rules table:  5b805622-879c-44da-9401-f887ba0fe9a9.app.prod.netmaker.io egress

Just after that line, the ping hangs without any error or anything, just hangs. I do see a lot of these:

checkin with server(s)
Interface is a bridge network: docker0
Interface is a bridge network: cni0

But the ping is still hanged and the network between both hosts lost.

Also, this is the ping after re-joining when it un-hangs automatically but with weird latency metrics:

PING 10.143.215.1 (10.143.215.1) 56(84) bytes of data.
64 bytes from 10.143.215.1: icmp_seq=5 ttl=64 time=4952 ms
64 bytes from 10.143.215.1: icmp_seq=6 ttl=64 time=3925 ms
64 bytes from 10.143.215.1: icmp_seq=7 ttl=64 time=2901 ms
64 bytes from 10.143.215.1: icmp_seq=8 ttl=64 time=1881 ms
64 bytes from 10.143.215.1: icmp_seq=9 ttl=64 time=856 ms
64 bytes from 10.143.215.1: icmp_seq=10 ttl=64 time=172 ms
64 bytes from 10.143.215.1: icmp_seq=11 ttl=64 time=191 ms
64 bytes from 10.143.215.1: icmp_seq=12 ttl=64 time=174 ms
64 bytes from 10.143.215.1: icmp_seq=13 ttl=64 time=173 ms
64 bytes from 10.143.215.1: icmp_seq=14 ttl=64 time=172 ms
64 bytes from 10.143.215.1: icmp_seq=15 ttl=64 time=173 ms
64 bytes from 10.143.215.1: icmp_seq=16 ttl=64 time=172 ms
64 bytes from 10.143.215.1: icmp_seq=17 ttl=64 time=171 ms
abhishek9686 commented 8 months ago

Interesting... these are the logs by using verbosity level 3:

Feb 27 22:52:09 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:52:09 adding addresses to netmaker interface
Feb 27 22:52:09 bsas01 netclient[4047997]: {"time":"2024-02-27T22:52:09.983285786Z","level":"INFO","source":"wireguard_linux.go 117}","msg":"adding address","address":"10.143.215.2","network":"10.143.215.0/24"}
Feb 27 22:52:10 bsas01 netclient[4047997]: {"time":"2024-02-27T22:52:10.20742628Z","level":"INFO","source":"daemon.go 328}","msg":"subscribing to host updates for","host":"803a4c30-6566-4065-91b5-8c085b74dc19","server":"5b805622-879c-44da-9401-f887ba0fe9a9.app.prod.netmaker.io"}
Feb 27 22:52:10 bsas01 netclient[4047997]: {"time":"2024-02-27T22:52:10.26468102Z","level":"INFO","source":"mqhandlers.go 114}","msg":"processing peer update for server","server":"5b805622-879c-44da-9401-f887ba0fe9a9.app.prod.netmaker.io"}
Feb 27 22:52:10 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:52:10 Deleting rules table:  5b805622-879c-44da-9401-f887ba0fe9a9.app.prod.netmaker.io egress
Feb 27 22:52:10 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:52:10 checkin with server(s)
Feb 27 22:52:10 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:52:10 Interface is a bridge network: docker0
Feb 27 22:52:10 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:52:10 Interface is a bridge network: cni0
Feb 27 22:52:12 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:52:12 determined new endpoint for peer 7Fde7avsXsQIAxEIKlW4dfO+zhgo2FdoaMLBXABBhyc= - 10.42.0.0:51821
Feb 27 22:52:14 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:52:14 determined new endpoint for peer 7Fde7avsXsQIAxEIKlW4dfO+zhgo2FdoaMLBXABBhyc= - 10.42.0.0:51821
Feb 27 22:52:56 bsas01 netclient[4047997]: {"time":"2024-02-27T22:52:56.37888809Z","level":"INFO","source":"mqhandlers.go 114}","msg":"processing peer update for server","server":"5b805622-879c-44da-9401-f887ba0fe9a9.app.prod.netmaker.io"}
Feb 27 22:52:56 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:52:56 Deleting rules table:  5b805622-879c-44da-9401-f887ba0fe9a9.app.prod.netmaker.io egress
Feb 27 22:53:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:53:08 checkin with server(s)
Feb 27 22:53:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:53:08 Interface is a bridge network: docker0
Feb 27 22:53:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:53:08 Interface is a bridge network: cni0
Feb 27 22:54:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:54:08 checkin with server(s)
Feb 27 22:54:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:54:08 Interface is a bridge network: docker0
Feb 27 22:54:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:54:08 Interface is a bridge network: cni0
Feb 27 22:55:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:55:08 checkin with server(s)
Feb 27 22:55:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:55:08 Interface is a bridge network: docker0
Feb 27 22:55:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:55:08 Interface is a bridge network: cni0
Feb 27 22:56:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:56:08 checkin with server(s)
Feb 27 22:56:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:56:08 Interface is a bridge network: docker0
Feb 27 22:56:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:56:08 Interface is a bridge network: cni0
Feb 27 22:57:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:57:08 checkin with server(s)Feb 27 22:57:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:57:08 Interface is a bridge network: docker0
Feb 27 22:57:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:57:08 Interface is a bridge network: cni0
Feb 27 22:57:56 bsas01 netclient[4047997]: {"time":"2024-02-27T22:57:56.445461144Z","level":"INFO","source":"mqhandlers.go 114}","msg":"processing peer update for server","server":"5b805622-879c-44da-9401-f887ba0fe9a9.app.prod.netmaker.io"}Feb 27 22:57:56 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:57:56 Deleting rules table:  5b805622-879c-44da-9401-f887ba0fe9a9.app.prod.netmaker.io egress
Feb 27 22:58:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:58:08 checkin with server(s)
Feb 27 22:58:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:58:08 Interface is a bridge network: docker0
Feb 27 22:58:08 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:58:08 Interface is a bridge network: cni0

While actively monitoring the logs between two hosts (one in the US under DO and a bare metal one on Argentina, latency around 170) the ping is stable up to this line:

Feb 27 22:57:56 bsas01 netclient[4047997]: {"time":"2024-02-27T22:57:56.445461144Z","level":"INFO","source":"mqhandlers.go 114}","msg":"processing peer update for server","server":"5b805622-879c-44da-9401-f887ba0fe9a9.app.prod.netmaker.io"}
Feb 27 22:57:56 bsas01 netclient[4047997]: [netclient] 2024-02-27 22:57:56 Deleting rules table:  5b805622-879c-44da-9401-f887ba0fe9a9.app.prod.netmaker.io egress

Just after that line, the ping hangs without any error or anything, just hangs. I do see a lot of these:

checkin with server(s)
Interface is a bridge network: docker0
Interface is a bridge network: cni0

But the ping is still hanged and the network between both hosts lost.

Also, this is the ping after re-joining when it un-hangs automatically but with weird latency metrics:

PING 10.143.215.1 (10.143.215.1) 56(84) bytes of data.
64 bytes from 10.143.215.1: icmp_seq=5 ttl=64 time=4952 ms
64 bytes from 10.143.215.1: icmp_seq=6 ttl=64 time=3925 ms
64 bytes from 10.143.215.1: icmp_seq=7 ttl=64 time=2901 ms
64 bytes from 10.143.215.1: icmp_seq=8 ttl=64 time=1881 ms
64 bytes from 10.143.215.1: icmp_seq=9 ttl=64 time=856 ms
64 bytes from 10.143.215.1: icmp_seq=10 ttl=64 time=172 ms
64 bytes from 10.143.215.1: icmp_seq=11 ttl=64 time=191 ms
64 bytes from 10.143.215.1: icmp_seq=12 ttl=64 time=174 ms
64 bytes from 10.143.215.1: icmp_seq=13 ttl=64 time=173 ms
64 bytes from 10.143.215.1: icmp_seq=14 ttl=64 time=172 ms
64 bytes from 10.143.215.1: icmp_seq=15 ttl=64 time=173 ms
64 bytes from 10.143.215.1: icmp_seq=16 ttl=64 time=172 ms
64 bytes from 10.143.215.1: icmp_seq=17 ttl=64 time=171 ms

which version of netmaker are you using? on the interface for this peer you are pinging, what is the endpoint you see, is it a public or private IP?

julpod commented 8 months ago

I'm using the latest version of netclient suggested from the UI: v0.22.0. I'm doing ping between both Private Addresses (IPv4) also taken from the UI. I'm not on a self hosted netmaker server but the cloud/saas one.