gravitl / netmaker

Netmaker makes networks with WireGuard. Netmaker automates fast, secure, and distributed virtual networks.
https://netmaker.io
Other
9.4k stars 547 forks source link

Relay functionality not working #486

Closed ethanfowler closed 2 years ago

ethanfowler commented 2 years ago

Hi, me again. I seem to be finding all the problems. I'm using the standard AWS EC2 Docker setup, I promise!

I had a node that couldn't ping others, despite it showing as healthy. This made me suspect NAT issues, so I set the netmaker node up as a relay for all nodes via the dashboard. This resulted in many issues, e.g. on the netmaker machine:

$ ping 10.20.32.7
PING 10.20.32.7 (10.20.32.7) 56(84) bytes of data.
From 10.20.32.1 icmp_seq=1 Destination Host Unreachable
ping: sendmsg: Required key not available

And other nodes being unable to ping each other or the netmaker machine. So I removed the relay config from netmaker, at which point netmaker's status went from HEALTHY to WARNING on the dashboard, and the netmaker machine didn't even have a nm-<network-name> virtual interface anymore, despite restarting docker, removing the containers, even rebooting the whole machine.

So I reconfigured it for relay, same result, then deconfigured it for relay, and now trying to ping nodes from netmaker:

$ ping 10.20.32.7
PING 10.20.32.7 (10.20.32.7) 56(84) bytes of data.
From 10.20.32.1 icmp_seq=1 Destination Host Unreachable
ping: sendmsg: Required key not available
ethanfowler commented 2 years ago

OK, seems setting the netmaker node as a relay for all machines was foolish, I've remade the network and set it to act as a relay only for the troublesome machine/NAT. Other nodes are happy, and the troublesome node can now ping the server again. The remaining issue is purely one of relay functionality; the troublesome node and others can both ping the server, but here is a tshark trace on the server of a normal node (.2) trying to ping the troublesome one (.3):

    1 0.000000000   10.20.32.2 → 10.20.32.3   ICMP 84 Echo (ping) request  id=0x001e, seq=85/21760, ttl=64
    2 1.023757598   10.20.32.2 → 10.20.32.3   ICMP 84 Echo (ping) request  id=0x001e, seq=86/22016, ttl=64
    3 2.048098042   10.20.32.2 → 10.20.32.3   ICMP 84 Echo (ping) request  id=0x001e, seq=87/22272, ttl=64
    4 3.072095780   10.20.32.2 → 10.20.32.3   ICMP 84 Echo (ping) request  id=0x001e, seq=88/22528, ttl=64

and the other way around:

    1 0.000000000   10.20.32.3 → 10.20.32.2   ICMP 84 Echo (ping) request  id=0x5b2b, seq=1/256, ttl=64
    2 1.004107754   10.20.32.3 → 10.20.32.2   ICMP 84 Echo (ping) request  id=0x5b2b, seq=2/512, ttl=64
    3 2.028096826   10.20.32.3 → 10.20.32.2   ICMP 84 Echo (ping) request  id=0x5b2b, seq=3/768, ttl=64
    4 3.052135384   10.20.32.3 → 10.20.32.2   ICMP 84 Echo (ping) request  id=0x5b2b, seq=4/1024, ttl=64

So pings from either get to the server, but both machines show no incoming requests on wireshark. Seems like the netmaker relay is simply not forwarding the packets?

ethanfowler commented 2 years ago

Additional info. On server:

$ route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         ip-172-31-0-1.e 0.0.0.0         UG    100    0        0 eth0
10.20.30.0      0.0.0.0         255.255.255.0   U     0      0        0 nm-<name1>
10.20.31.0      0.0.0.0         255.255.255.0   U     0      0        0 nm-<name2>
10.20.32.0      0.0.0.0         255.255.255.0   U     0      0        0 nm-<name3 - problem network>
10.20.33.0      0.0.0.0         255.255.255.0   U     0      0        0 nm-<name4>
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0
172.18.0.0      0.0.0.0         255.255.0.0     U     0      0        0 br-65e435dc9f33
172.31.0.0      0.0.0.0         255.255.240.0   U     0      0        0 eth0
ip-172-31-0-1.e 0.0.0.0         255.255.255.255 UH    100    0        0 eth0

$ sysctl net.ipv4.ip_forward
net.ipv4.ip_forward = 1
afeiszli commented 2 years ago

@ethanfowler are you still encountering this issue? I think the issue was because of initially setting all peers (including server) as relayed. We've removed this from the UI in 0.9.2 so should be ok now.

afeiszli commented 2 years ago

Relay should be working as of 0.9.4. Please let us know if you're still experiencing this issue and we can re-open.