libremesh / lime-packages

LibreMesh packages configuring OpenWrt for wireless mesh networking
https://libremesh.org/
GNU Affero General Public License v3.0
281 stars 96 forks source link

Detect LAN-switch-LAN connection situations #1118

Closed ilario closed 2 weeks ago

ilario commented 4 months ago

Within the cable purpose autodetection GSoC project by Nemael, we have the option to configure an interface once we detect a specific situation.

When 2 LibreMesh routers are connected to a switch (from another router or a third LibreMesh router) batman-adv has problems and specific configuration has to be applied to those LAN ports in order for the network to work, like this.

The question here is how to detect if a LAN port is connected to a network with LibreMesh data on it (either directly to a LibreMesh node or through a switch).

ilario commented 4 months ago

The error messages look like this:

[  121.472686] batman_adv: bat0: Possible loop on VLAN -1 detected
which can't be handled by BLA - please check your network setup!

[  117.539621] br-lan: received packet on bat0 with own address as
source address (addr:d4:5f:25:eb:7e:ac, vlan:0)
[  117.555507] br-lan: received packet on bat0 with own address as
source address (addr:d4:5f:25:eb:7e:ac, vlan:0)
[  117.566445] br-lan: received packet on bat0 with own address as
source address (addr:d4:5f:25:eb:7e:ac, vlan:0)
[  118.340415] mt7530-mdio mdio-bus:1f: port 1 failed to delete
dc:9f:db:37:28:a9 vid 0 from fdb: -2
[  122.441546] net_ratelimit: 1011 callbacks suppressed

and the effect is an unstable network.

ilario commented 4 months ago

The error messages are the same as the ones observed in https://github.com/libremesh/lime-packages/issues/189, but there there was (also?) another problem that has been solved in https://github.com/libremesh/lime-packages/pull/726

ilario commented 4 months ago

More info in this email: https://lists.autistici.org/message/20240714.140352.58fe57b2.en.html

Once detected, a possible solution would be to add a huge warning in lime-config output suggesting how to fix this manually (the detection script would run during lime-config execution). And the same warning should appear in lime-app with some buttons for fixing. It is risky to apply the right configuration automatically, as the configured LAN port will not be in br-lan anymore and the user could lose access to the router, if the user was connected to that port (or group of ports, as in TP-Link WDR3600 eth0.1 which includes all 4 LAN ports) as a client.

ilario commented 3 months ago

@pony1k did not observe any issue connecting two DSA-supported LibreMesh routers to a dumb switch. And the batman-adv on the two routers were also detecting each other via the cable (that did not happen in my tests). More info on: https://lists.autistici.org/message/20240726.150840.dcc0e028.en.html

ilario commented 2 weeks ago

The switch between nodes should not make the situation worse than #1121, so we can continue the discussion there.