libremesh / lime-packages

LibreMesh packages configuring OpenWrt for wireless mesh networking
https://libremesh.org/
GNU Affero General Public License v3.0
276 stars 95 forks source link

batman-in-batman loops in certain configurations #1032

Open pony1k opened 12 months ago

pony1k commented 12 months ago

Since https://github.com/libremesh/lime-packages/pull/726, by default, ethernet interfaces are configured as batadv hardifs (on top of 802.1ad vlan) while also being in the bridge. It is possible to produce batman-in-batman loops if there is a node that does not have batadv configured on ethernet (like before https://github.com/libremesh/lime-packages/pull/726). To do so, one needs to connect via ethernet the LAN-ports of a node with no batadv-on-ethernet and two other nodes. Additionally, the one with no batadv-on-ethernet is connected via wifi to the same batadv-mesh as one of the other two.

Here is how that works:

Note that node C is only there so that node B sends the broadcast frames out via ethernet. There may be other ways to create this kind of loops. Here is how such a frame looks like after a few cycles:

# tcpdump -nexxi eth0 ether proto 0x4305

15:53:54.841350 02:db:d6:e9:07:cb > ff:ff:ff:ff:ff:ff, ethertype 802.1Q-QinQ (0x88a8), length 214: vlan 67, p 0, ethertype 0x4305, 
        0x0000:  ffff ffff ffff 02db d6e9 07cb 88a8 0043  ...............C
        0x0010:  4305 010f 2e00 0007 27e0 0258 474e 8890  C.......'..XGN..
        0x0020:  ffff ffff ffff 02db d6e9 07cb 88a8 0043  ...............C
        0x0030:  4305 010f 2e00 0007 27da 0258 474e 8890  C.......'..XGN..
        0x0040:  ffff ffff ffff 02db d6e9 07cb 88a8 0043  ...............C
        0x0050:  4305 010f 2e00 0007 27d4 0258 474e 8890  C.......'..XGN..
        0x0060:  ffff ffff ffff 02db d6e9 07cb 88a8 0043  ...............C
        0x0070:  4305 010f 2e00 0007 27cc 0258 474e 8890  C.......'..XGN..
        0x0080:  ffff ffff ffff 02db d6e9 07cb 88a8 0043  ...............C
        0x0090:  4305 000f 3200 8a4b 3e30 02ab 46e9 07cb  C...2..K>0..F...
        0x00a0:  02ab 46e9 07cb 00ff 002c 0401 001c xxxx  ..F......,....
        0x00b0:  xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx  
        0x00c0:  xxxx xxxx xxxx xxxx xxxx 0602 0004 0700            ......
        0x00d0:  0000 0201 0000                           ......

The original frame starts at 0x0080. It is a BATADV_IV_OGM-frame tagged with vid 67 (88a8 0043). That frame is inside of some layers of BATADV_BCAST frames and vlan tags. Here, node A has 02:58:47:4e:88:90. 02:db:d6:e9:07:cb is the address of eth0_67 of node B. This should only happen if batman is configured on top of 802.1ad vlans. I should not happen with 802.1q vlan or without vlan. This is because batman normally drops batman frames on its soft interface, and it understands 802.1q tags. It does, however, not understand 802.1ad tags. So we are successfully hiding the batman frames from batman by putting a 802.1ad tag on it. See here: soft-interface.c#L217-L233 and here: soft-interface.c#L437-L451 While these loops should not occur in default configuration (unless there is a bug that leads to batadv not being configured on the lan port of DSA devices that have only a single LAN port), when it happens it is not at all obvious. So I propose to somehow prevent this from happening. One easy way to do that might be to add a bridge filter rule that filters on batadv ethertype.

ilario commented 11 months ago

Wow, great research work!!! The ideal solution could also be to make sure this situation never occurs :) But the firewall rule sounds like a meaningful alternative solution to me! @G10h4ck @spiccinini @javierbrk @selankon opinions?