gravitl / netmaker

Netmaker makes networks with WireGuard. Netmaker automates fast, secure, and distributed virtual networks.
https://netmaker.io
Other
9.4k stars 547 forks source link

[Bug]: multiple nodes (mesh clients) behind a single firewall/router cannot reach each other #2804

Open saket424 opened 8 months ago

saket424 commented 8 months ago

Contact Details

What happened?

From the Troubleshooting FAQ

Can I connect multiple nodes (mesh clients) behind a single firewall/router? As of v0.18.0, netmaker now uses a stun server (Session Traversal Utilities for NAT). This >provides a tool for communications protocols to detect and traverse NATs that are located >in the path between two endpoints.

I tried setting a overlay mesh network of 2 nodes behind the a single firewall/router

10.20.30.1 is the netmaker server 10.20.30.2 is nodeA 10.20.30.3 is nodeB

nodeA can ping server nodeB can ping server

nodeA cannot ping nodeB when using the netmaker interface even though they are on the same lan (and there is no peer isolation in play)

Version

v0.22.0

What OS are you using?

Linux

Relevant log output

We reproduced this issue in multiple instances and so cannot fault a specific openwrt router

Contributing guidelines

abhishek9686 commented 8 months ago

Have you set node A and node B as static hosts? usually clients reach each other over local endpoints that are detected if possible, this happens automatically unless the host is set as static

saket424 commented 8 months ago

No. The addresses for the nodes are being assigned sequentially by the netmaker server akin to dhcp

saket424 commented 8 months ago

We installed the mesh network overlay from defined/nebula and that doesn't appear to have this issue for hosts behind the same openwrt nat. I am puzzled that this is even an issue because it is the simplest candidate pair to discover when both hosts are behind the same nat

saket424 commented 8 months ago

Looks like there are others reporting similar issues. I certainly am not using static hosts. Are there assumptions being made about hairpining from the parent router that these two nodes are behind ? 'wg show' has the public IP address of the patent router

https://www.reddit.com/r/netmaker/comments/16wimqs/peer_to_peer_latency/

https://github.com/gravitl/netmaker/issues/1713

saket424 commented 7 months ago

@abhishek9686 If the two nodes are behind a single nat, they are able to ping each other If the two nodes are behind a double nat, they are not able to ping each other

in the first case, wg show has the local ip in the second case, wg show has the external ip address (retrieved from stun)

Does that give you any ideas of how to fix the issue?

Just as another data point, nebula/defined don't have this issue even when double natted. looking forward to hearing your insights in this matter

abhishek9686 commented 7 months ago

@abhishek9686 If the two nodes are behind a single nat, they are able to ping each other If the two nodes are behind a double nat, they are not able to ping each other

in the first case, wg show has the local ip in the second case, wg show has the external ip address (retrieved from stun)

Does that give you any ideas of how to fix the issue?

Just as another data point, nebula/defined don't have this issue even when double natted. looking forward to hearing your insights in this matter

those two nodes behind a double NAT, are they in the same local network?

saket424 commented 7 months ago

Yes on the same local network that is double natted It should be simple for you to replicate

abhishek9686 commented 7 months ago

Yes on the same local network that is double natted It should be simple for you to replicate

if there are on local network, endpoint detection on netclient will set the endpoint to their local address, first it will check if it is able ping the other peer over the local address.

saket424 commented 7 months ago

@abhishek9686 I am suggesting you try to reproduce this issue since I am reporting the endpoint address is not the local address when you move from single nat to double nat.

yabinma commented 4 months ago

@saket424 , I did a test in local, but it's not re-produced in the test. Here is the structure in the test, please have a check if it's a double nat in your description, image

image endpoint is updated to local address in the screenshot above and they are able to communicate each other.

The test was done in v0.24.1. Have you re-tested the scenario in new version?