sudomesh / bugs

report sudomesh bugs or other issues here if you don't know where to put them
8 stars 2 forks source link

ping from homenode > extender node > ... > wan does not succeed #29

Open jhpoelen opened 6 years ago

jhpoelen commented 6 years ago

Using sudomesh testbed:

on dolphin node:

#ping 8.8.8.8
ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
^C
--- 8.8.8.8 ping statistics ---
69 packets transmitted, 0 packets received, 100% packet loss

Note 69 packets transmitted with 100% packet loss.

Note however, than on dolphin's extender node -

#tcpdump -i br-open | grep ICMP
listening on br-open, link-type EN10MB (Ethernet), capture size 65535 bytes
23:34:52.172113 IP google-public-dns-a.google.com > 100.65.42.1: ICMP echo reply, id 21619, seq 192, length 64
23:34:53.177926 IP google-public-dns-a.google.com > 100.65.42.1: ICMP echo reply, id 21619, seq 193, length 64
23:34:54.173353 IP google-public-dns-a.google.com > 100.65.42.1: ICMP echo reply, id 21619, seq 194, length 64

So, it seems like the packets are getting to the extender node, but for some reason, they are not getting routed back to the home node.

Juul commented 6 years ago

alright so either the routing tables on dolphin's extender node are messed up or the firewall rules on dolphin are messed up. If you insert an ACCEPT clause at the top of the INPUT chain on dolphin and that doesn't make it work then it probably wasn't a firewall issue:

iptables -I INPUT -j ACCEPT

of course it could be a NAT issue but that seems less likely. If running that command doesn't make the ping go through then i'd look at the routing table on dolphin's extender node and possibly also its firewall rules in the FORWARD chain.

jhpoelen commented 6 years ago

I should probably mentioned that the pings for mesh ip addresses are fast .

jhpoelen commented 6 years ago

e.g.,

# ping 100.64.0.42
PING 100.64.0.42 (100.64.0.42): 56 data bytes
64 bytes from 100.64.0.42: seq=0 ttl=60 time=34.268 ms
64 bytes from 100.64.0.42: seq=1 ttl=60 time=33.718 ms
bennlich commented 6 years ago

@Juul nice test! I verified that adding the permissive iptables rule to dolphin fixes the pings. So it seems like iptables rules on dolphin are broken.

One thing to note--even with the permissive rule, sometimes the first ~10 pings don't see replies. For now I'm assuming this is because there's a weak link somewhere in the chain of routers, but am including the note here in case it suggests something more specific. Yeesh!

bennlich commented 6 years ago

And it does not seem to matter whether the rule is appended or inserted. iptables -A INPUT -j ACCEPT works just as well. Surprisingly (to me), narrowing down the rule to the expected interface does not work. E.g. iptables -I INPUT -i eth0.1 -j ACCEPT does not help.

bennlich commented 6 years ago

Aha! How weird. The packets leave on interface eth0.1, and they return on interface eth0.10. Is that expected?

Juul commented 6 years ago

On Fri, Apr 13, 2018 at 7:32 PM, Benny Lichtner notifications@github.com wrote:

Aha! How weird. The packets leave on interface eth0.1, and they return on interface eth0.10. Is that expected?

Short answer: Yes.

Long answer:

The ICMP ping request packet is destined for the internet. The only (or best) path to the Internet i via the extender node. The home node knows this because babeld received information about a default route via the eth0.1 interface (from the extender node) and then added a route to the routing table saying that the default route is via eth0.1

The request packet is thus sent over eth0.1

Then later a response comes back to the extender node destined for the home node.

The extender node will have received a babeld announcement of a route to the home node's /26 subnet via eth0.1 and that is all as expected BUT the extender node and home node are communicating using a single ethernet cable but with VLAN tagging enabled and they share two VLANs on that one cable: eth0.1 and eth0.10. The eth0.1 VLAN is dedicated for communication on the mesh (adhoc mode) network and for this network all communication is layer 3 (IP) since you can't bridge adhoc networks. The eth0.10 VLAN is for the public (master/ap mode) network and all extender nodes share that one VLAN for the public network and are using layer 2 (ethernet) so for that VLAN there is no routing happening at all between a home node and its extender nodes.

When the extender node has to decide how to get a packet to the home node it will first check if the home node is on the same subnet as it (which it is) and then, before consulting its routing table at all, it will check its ARP table, and since the extender and home node are using bridging (layer 2) there will be an entry for the home node in the extender node on eth0.10 and it will simply send the packet over that interface.

But then why didn't the home node do the same ARP lookup and then bypass the routing completely you ask? Because the ping request it sent was not destined for the extender node. It was destined for e.g. 8.8.8.8 which is not on the same subnet as the home node.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/sudomesh/bugs/issues/29#issuecomment-381305114, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHfgEHbU4buG5l8kaFwhSWB__0qbOpdks5toYn1gaJpZM4TUBhP .

bennlich commented 6 years ago

@juul wow incredible breakdown! Thanks for this! This really helps me to understand the VLANs and interfaces on the home nodes. @jhpoelen @eemblam if node whisperers had to do exercises, maybe this would be one :-P

@juul based on your description, it sounds like all traffic destined for a home node, passing through its extender, will arrive on the eth0.10 interface. Meanwhile, traffic destined to a home node by way of a wirelessly meshing home node will arrive on its eth0.1 interface. Does this sound right?

Back to the bug at hand--it appears to be a home node firewall bug since adding a permissive rule fixed the ping. But it doesn't seem like relevant changes have been made recently to https://github.com/sudomesh/makenode/commits/master/configs/ar71xx/home_nodes/templates/files/etc/init.d/meshrouting. Could this be an old bug? Or is there another place to look for firewall mischief?

Juul commented 6 years ago

it sounds like all traffic destined for a home node, passing through its extender, will arrive on the eth0.10 interface. Meanwhile, traffic destined to a home node by way of a wirelessly meshing home node will arrive on its eth0.1 interface. Does this sound right?

Yep. Until we get private networks working over extenders (which requires a small patch to wpa-supplicant I've been meaning to write) then this will be the case.

eenblam commented 6 years ago

@bennlich agreed. This has been very enlightening. Thanks @Juul !