gravitl / netmaker

Netmaker makes networks with WireGuard. Netmaker automates fast, secure, and distributed virtual networks.
https://netmaker.io
Other
9.4k stars 547 forks source link

Help: Can't reach nodes after upgrading to v0.8.0 #299

Closed rheicide closed 3 years ago

rheicide commented 3 years ago

Yesterday I upgraded Netmaker to v0.8.0 and now I can't reach any node, even though they can check in just fine—their statuses are "Healthy" in the dashboard.

$ /etc/netclient/netclient checkin
2021/09/25 10:57:28 [netclient] running checkin for all networks
2021/09/25 10:57:32 [netclient] checked in successfully for Home

When I do a netclient pull on the Netmaker server, it shows the following:

$ /etc/netclient/netclient pull -n Home
2021/09/25 10:48:53 error running command: /sbin/ip -4 route add x.x.x.x/x dev nm-Home
2021/09/25 10:48:53 RTNETLINK answers: No such device
2021/09/25 10:48:53 error running command: /sbin/ip -4 route add x.x.x.x/x dev nm-Home
2021/09/25 10:48:53 RTNETLINK answers: No such device
2021/09/25 10:48:54 [netclient] reset network and peer configs
2021/09/25 10:48:54 [netclient] success

But ifconfig shows the interface is there:

nm-Home   Link encap:UNSPEC  HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
          UP POINTOPOINT RUNNING NOARP  MTU:0  Metric:1
          RX packets:2 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1
          RX bytes:180 (180.0 B)  TX bytes:316 (316.0 B)

The network's address range is 10.20.30.0/24, there are 2 nodes at 10.20.30.1 (the server, ingress) and 10.20.30.3 (a remote Linux server, egress with 2 IP ranges). The Netmaker server can't even ping itself at 10.20.30.1.

Everything was working fine with v0.7 (set up with docker-compose). I tried different things, including deleting the interface with ip link delete dev nm-Home and then netclient pull, but nothing worked.

Could someone please help 🙏🏾 ?

mattkasun commented 3 years ago

Did you uodate the netclient instances to v0.8

rheicide commented 3 years ago

Yes, I did:

$ /etc/netclient/netclient --version
Netclient CLI version v0.8.0
$ /etc/netclient/netclient pull -n Home
2021/09/25 20:35:52 error running command: /sbin/ip -4 route add 208.85.40.0/21 dev nm-Home
2021/09/25 20:35:52 RTNETLINK answers: No such device
2021/09/25 20:35:52 error running command: /sbin/ip -4 route add 119.116.160.0/21 dev nm-Home
2021/09/25 20:35:52 RTNETLINK answers: No such device
2021/09/25 20:35:53 [netclient] reset network and peer configs
2021/09/25 20:35:53 [netclient] success
pcfriek1987 commented 3 years ago

What does your ip route show? Does it show up there at all? It does not even have an IP on there it seems.

rheicide commented 3 years ago

@pcfriek1987 Right, ip route shows nothing about 10.20.30.0/24. I didn't check what it looked like before the upgrade from v0.7.3 to v0.8.0, but it was working fine back then so maybe v0.8.0 somehow failed to configure the routes in my case.

$ ip route
default via 192.168.1.1 dev bond0  src 192.168.1.205
172.17.0.0/16 dev docker0  proto kernel  scope link  src 172.17.0.1
172.19.0.0/16 dev docker-39a0c8b4  proto kernel  scope link  src 172.19.0.1
192.168.1.0/24 dev bond0  proto kernel  scope link  src 192.168.1.205
mattkasun commented 3 years ago

If I were you, I would delete the affected node(s) (easiest from ui) and then rejoin the network sudo /etc/netclient/netclient join --name <name of your node> -t <access token>

rheicide commented 3 years ago

@mattkasun Thanks but that didn't work either. I had to delete the network, recreate it and rejoin it from every node.

afeiszli commented 2 years ago

FYI as of 0.8.1 this will no longer occur. Issue was due to missing fields between client versions. Server will not add default fields for clients after an upgrade if a field is missing from their config.