Open spiccinini opened 5 years ago
+1 (I added the release-blocker label as this is a non-retrocompatible change, it has to be sorted out before next release) See also discussion on https://lists.libremesh.org/pipermail/lime-dev/2019-August/001144.html and unconfirmed report on https://forum.openwrt.org/t/mediatek-and-vlan-802-1ad-on-ethernet/42346
I agree in removing not neded vlans!
It also would solve the problem with mediatek hardware switches
So, not needed VLANs are Babeld, BMX, OLSR ones, right? But we would keep the BATMAN-adv one (if I remember correctly, @G10h4ck said that Batman needs to be on an interface by its own, and we use the VLAN=%N1 for breaking the Batman domains in different areas of the networks), right?
So, not needed VLANs are Babeld, BMX, OLSR ones, right?
Yes. We can start with babel protocol and see how it goes.
But we would keep the BATMAN-adv one (if I remember correctly, @G10h4ck said that Batman needs to be on an interface by its own, and we use the VLAN=%N1 for breaking the Batman domains in different areas of the networks), right?
Yes. that's right
vlan=0 can be used to disable the vlan. I added some fixes for this use in #593
Just noticed that Batman-adv has implemented (since long time) some VLAN mechanism on top of bat0 interface, see here and here. Does anyone understand if the usage of Batman-adv's VLAN can replace the usage of classical VLANs? If this is correct, the hardware switches would not recognize these packets as tagged ever, correct?
From here https://www.open-mesh.org/projects/batman-adv/wiki/Tweaking#VLAN-handling :
The batX mesh interface created by batman-adv also supports VLANs which enables the administrator to configure virtual networks with independent settings on top of a single mesh cloud.
which sounds like the broadcast packets from a client will be limited on a single VLAN zone but the Batman-adv hello packages would go everywhere, does anyone know if this is correct?
From here https://www.open-mesh.org/projects/batman-adv/wiki/Tweaking#VLAN-handling :
The batX mesh interface created by batman-adv also supports VLANs which enables the administrator to configure virtual networks with independent settings on top of a single mesh cloud.
which sounds like the broadcast packets from a client will be limited on a single VLAN zone but the Batman-adv hello packages would go everywhere, does anyone know if this is correct?
Yes, but a single VLAN zone built on top of batman-adv potentially span over the whole batman-adv mesh. Resuming vlans on top of batman-adv doesn't help with this issue.
but a single VLAN zone built on top of batman-adv potentially span over the whole batman-adv mesh
Why should we use a single one? What I was thinking was to use the Batman-adv's VLANs in a very similar way as we are using VLANs: the ID would depend on the network SSID.
but a single VLAN zone built on top of batman-adv potentially span over the whole batman-adv mesh
Why should we use a single one? What I was thinking was to use the Batman-adv's VLANs in a very similar way as we are using VLANs: the ID would depend on the network SSID.
That would eventually split the broadcast domain, but not the topology of the L2 network, so it would have scalability problems.
Also we don't use vlans for batman just to improve scalability, We use them also because batman-adv (or maybe just how they implemented the bat-adv configuration interface on owrt) "monopolize" (become master) of the interfaces it uses to send OGM etc, and interfaces can have only one master (so you cannot put the same interface both inside a bridge and use them for batman-adv OGM), when we create a vlan it is like it is another interface, so we put the raw interface inside the bridge and the vlan to batman-adv
I agree with @G10h4ck about the use of VLAN to allow interoperability with linux bridge on the same physical interface. I know this is a problem with some (few) switches which refuse operating with both, tagged and untagged frames on the same port (which is why LiMe uses 802.1ad...?). Almost all switches do allow setting a PVID for untagged frames (which will then arrive tagged on the CPU port). Writing auto-configuration logic for that which detects swconfg features/restrictions of interfaces is beyond the current capabilities though. Things will get much easier once DSA is more common (ie. used for qca,ar9331-switch for SoC-built-in switch on ath79 https://github.com/torvalds/linux/blob/master/drivers/net/dsa/qca/ar9331.c, qca8k for the QCA gigE switches, MT7530 on all Ralink/MediaTek with gigE, https://github.com/stroese/linux/blob/gardena-v5.5/drivers/net/dsa/mt7628-esw.c on all Ralink/MediaTek with FE, Lantiq is transisioning as well https://github.com/openwrt/openwrt/pull/3085, almost all common external Marvell, Broadcom and RealTek switch ICs are already supported in vanilla Linux). It doesn't look like it's going to happen for the 20.x release though for most targets, but hopefully 20.x will be the last swconfig-based release.
Part of the reason why I thought having Batman-adv without VLAN was very important was the bug happening on Mediatek-based YouHua WR1200JS devices I reported here https://forum.openwrt.org/t/mediatek-and-vlan-802-1ad-on-ethernet/42346
But testing #726, this does not seem to be a terrible problem: the Batman-adv hello packets get crippled adding zeros and random data, but Batman-adv does not care and likes them anyway. Then the actual data gets routed through other interfaces (e.g. the Babeld one if 802.1q is used instead of 802.1ad i.e. adding a suffix in the list protocols 'babeld:17:8021q'
LibreMesh configuration line).
So, in order to have these devices working also with ethernet we should just take care of removing VLAN usage from Babeld, for example with #631.
As with Kernel 5.4 we switch MT7621 to the upstream DSA driver instead of swconfig it'd be interesting to test with with OpenWrt master snapshot and see if the issue still occurs there and if so, report back to have it fixed -- interest should be much larger than to fix anything in OpenWrt-specific driver which OpenWrt itself has dropped...
Thanks! I'm going to test with snapshot code from OpenWrt downloads website :)
Update on the Mediatek 802.1ad bug: I confirmed it on OpenWrt snapshot with DSA and seems that the origin is in a small bug in the kernel: https://forum.openwrt.org/t/mediatek-and-vlan-802-1ad-on-ethernet/42346/9
The pre-DSA bug was different but could have a very similar origin. This means that currently VLAN 802.1ad should not work on any device with a MediaTek switch.
The scenarios we are using in the networks doesn't need vlans for the protocols. Let's remove them to reduce complexity and bugs (like #580 )