gravitl / netmaker

Netmaker makes networks with WireGuard. Netmaker automates fast, secure, and distributed virtual networks.
https://netmaker.io
Other
9.4k stars 547 forks source link

Client and server cannot ping each other when deployed on the same host #1688

Closed afirix closed 1 year ago

afirix commented 1 year ago

What happened?

I have issues with running the Netmaker server (in a container) and a client node (on host OS) on the same machine. They don't seem to be able to ping each other, but according to documentation at the 2nd paragraph of https://docs.netmaker.org/troubleshoot.html#server it should be possible.

Some details about what I've been doing:

  1. Installed Netmaker (0.16.2) following the public documentation. I used Docker Compose method and deployed it on a VPS.
  2. Created a new network called 'testnet1' with the IP range of 10.20.30.0/24. The server got assigned the IP of 10.20.30.254.
  3. Installed netclient (0.16.1, but also tried 0.16.2) on the same VPS and ran sudo netclient join -t <mytoken> --port 51840. Got a new node with the IP of 10.20.30.1. Now I have two nodes deployed on the same host.
  4. Ran sudo netclient join -t <mytoken> on another server (sits within my LAN). Got one more node with the IP of 10.20.30.2.
  5. Now, trying to validate connectivity in the mesh with pings in all possible directions. Here's what I am getting:
    • From 10.20.30.1:
    • ping to 10.20.30.2 succeeds
    • ping to 10.20.30.254 fails with ping: sendmsg: Required key not available message
    • From 10.20.30.2:
    • pings to both 10.20.30.1 and 10.20.30.254 succeed
    • From 10.20.30.254:
    • ping to 10.20.30.1 fails with ping: sendto: Destination address required message
    • ping to 10.20.30.2 succeeds

So, 10.20.30.1 and 10.20.30.254 cannot ping each other. Ports 51821-51830 and 51840 are both open in the iptables and VPS web console.

I tried running sudo wg showconf nm-testnet1 on all nodes and here is what I am seeing (public IPs and private keys redacted):

On 10.20.30.1:

[Interface]
ListenPort = 51840
PrivateKey = <redacted>

[Peer]
PublicKey = I9zImybbBw9TyHQy92ePYMwoYNPtstFbI1xKICMG0iI=
AllowedIPs = 10.20.30.2/32
Endpoint = XXX.YYY.ZZZ.74:54206
PersistentKeepalive = 20

On 10.20.30.2:

[Interface]
ListenPort = 54206
PrivateKey = <redacted>

[Peer]
PublicKey = 0wuGST3vFYBR9z4xTCIf4q2xmLJDU5Ee1WboEQ0dx3U=
AllowedIPs = 10.20.30.254/32
Endpoint = AAA.BBB.CCC.208:51821
PersistentKeepalive = 20

[Peer]
PublicKey = QQiiv2TtBvRv6vvv8yLzl3oJJVnIqrQwm9+umptYJzE=
AllowedIPs = 10.20.30.1/32
Endpoint = AAA.BBB.CCC.208:51840
PersistentKeepalive = 20

On 10.20.30.254:

[Interface]
ListenPort = 51821
PrivateKey = <redacted>

[Peer]
PublicKey = I9zImybbBw9TyHQy92ePYMwoYNPtstFbI1xKICMG0iI=
AllowedIPs = 10.20.30.2/32
Endpoint = XXX.YYY.ZZZ.74:54206
PersistentKeepalive = 20

[Peer]
PublicKey = QQiiv2TtBvRv6vvv8yLzl3oJJVnIqrQwm9+umptYJzE=
AllowedIPs = 10.20.30.1/32
PersistentKeepalive = 20

What looks wrong to me is that config on 10.20.30.1 does not have a peer entry for 10.20.30.254.

What I also tried:

All the problematic observations are limited to the nodes deployed on the VPS. The client node on the other server that's inside my LAN looks perfectly correct. Wondering if I misconfigured something somewhere, but I exhausted my troubleshooting options, not sure where else to look. Any help appreciated.

Version

v0.16.1

What OS are you using?

Linux

Relevant log output

No response

Contributing guidelines

afirix commented 1 year ago

I might have gotten to the bottom of it myself. I checked the output of the call to /api/nodes/testnet1 when client nodes join the network, and the peers attribute didn't contain an entry for 10.20.30.254 when called from 10.20.30.1. So, the client node of 10.20.30.1 never gets to know of existence of the server node of 10.20.30.254.

I tried debugging it further and it looks like the server node is getting filtered out of the list of peers here. Indeed, both nodes have the same endpoint (as they are on the same physical host), and server node's LocalAddress is not set, so the logic falls through to that continue statement and doesn't append the server peer to the output list.

I have tried a couple things to work around:

Would be great to hear from netmaker team regarding this.

mneirynck commented 1 year ago

Same issue here, following!

mattkasun commented 1 year ago

PR #1692 fixes

alcroito commented 1 year ago

I can confirm the described scenario works with 0.17.1.