VLAN versus physical interface attributes

ipspace commented 1 year ago

(extracting #866 into a "what we think needs to be done" issue)

When using VLANs with mode set to bridge or irb, VLAN creates VLAN/SVI interfaces and copies attributes from the "first" physical interface (for whatever value of "first") into the VLAN interface augmenting them with the vlans.vlan attributes -- one of those things I regret I ever implemented, but that's the cost of growing "organically". At least it's discouraged in the documentation (https://netlab.tools/module/vlan/#vlan-interface-parameters)

Anyway, time to fix it. This seems to be where we want to be long-term:

Most attributes of the VLAN/SVI interface are defined in the global vlans.vlan definition.
Per-node attributes are defined in the node vlans.vlan definition.
Attributes that belong to physical link/interface can be defined on the link/interface. All other attributes would trigger an error.

First fun question: which attributes belong to physical link/interface and not to the VLAN interface. Right now the only attribute that is not internal to netlab seems to be mtu, but even that should be defined on the VLAN so it's consistent across all member interface (but that's another story).

Next: how do we handle migration from the current behavior to the new one. We could be drastic, change stuff to the way it should be done, document a breaking change and move on. That would require going to release 1.7.0 (and I have no problem with that).

Alternatively, we could prolong the pain (aka "I'll take an Advil instead of going to the dentist") and add either warnings or feature flags. Honestly, the more I think about it, the more I'd go for a breaking change considering we would break something documented as "don't ever do it!"

Thoughts? @jbemmel @ssasso @petercrocker @ddutt

jbemmel commented 1 year ago

If we could avoid creating SVI interfaces for bridge-mode VLANs, that would be a bonus

There are only a handful of attributes that make sense to define (override) at the node interface level, such as a specific IP address or an OSPF cost (overriding whatever is defined at the VLAN level, if anything). Even mtu could make sense if the user is testing out a what-if scenario.

The most problematic part of the current implementation, is the copying of physical interface attributes from any node - even one different from the one being transformed. In my mind, the (l3) SVI interface would get its attributes copied from the VLAN definition, and then in a second step any overriding parameters from the physical interface would be applied.

For mtu, the reverse direction could be added ( i.e. allow to define mtu at the VLAN level, and copy it into any physical interface that is attached to that VLAN - unless it's already defined there ). Generate a warning in case of mismatch

ipspace commented 1 year ago

If we could avoid creating SVI interfaces for bridge-mode VLANs, that would be a bonus

You should be working with different types of devices ;) Some of them need SVI-type interface even for pure L2, for example something like a Linux bridge (if you're not using VLAN-aware bridges).

There are only a handful of attributes that make sense to define (override) at the node interface level, such as a specific IP address or an OSPF cost (overriding whatever is defined at the VLAN level, if anything).

Or BFD timers, or BGP local-as or... I'm pretty sure every protocol has something that can be reasonably configured on an individual interface.

Even mtu could make sense if the user is testing out a what-if scenario.

OK.

The most problematic part of the current implementation, is the copying of physical interface attributes from any node - even one different from the one being transformed.

Not sure about that. SVI interface code only looks at the node interfaces (or I can't read my code any longer)

In my mind, the (l3) SVI interface would get its attributes copied from the VLAN definition, and then in a second step any overriding parameters from the physical interface would be applied.

Which physical interface? I agree this thing needs to be fixed, but it turns out it started (in #866) with something the documentation says you shouldn't be doing. I'm not going to change the implementation of things you shouldn't be doing, but I'm OK with preventing them from being done for the sake of sanity of everyone involved.

For mtu, the reverse direction could be added ( i.e. allow to define mtu at the VLAN level, and copy it into any physical interface that is attached to that VLAN - unless it's already defined there ). Generate a warning in case of mismatch

Yeah, I was thinking along the same lines. Now tell me what to do on the trunk ports?

jbemmel commented 1 year ago

ok, I can see that there are device models where it could make sense - just not in my world ;) Perhaps we could make it a device specific feature flag for the vlan module: "vlan.l2_svi" (default True, and I'll put it to False for srlinux/sros)

I take back the part about 'any' node - it copies from the first physical interface connected to a given vlan on the same node.

The context here, is that I'm accustomed to a different mental model for vlans - to me, vlans are just tags on ports. A trunk port is multiple different tags on a port.

Perhaps we're better off not copying anything from physical interface to SVI. If the user wants to override global attributes, they'd instantiate the VLAN on the node and define those attributes there. An exception could be made if there is a 1:1 mapping between physical port and an access vlan - then it's clear. But multiple ports on the same vlan, or a trunk port with multiple vlans, would generate errors if they have interface attributes

ipspace commented 1 year ago

ok, I can see that there are device models where it could make sense - just not in my world ;) Perhaps we could make it a device specific feature flag for the vlan module: "vlan.l2_svi" (default True, and I'll put it to False for srlinux/sros)

Or you could skip the SVI interfaces in the interface template if type == 'svi' and vlan.mode == 'bridge'? Or write a device quirks module that does the same? You seem to be the only one having an issue with this behavior, so maybe it's easier to solve it in a device-specific way instead of adding a feature flag into the core code?

The context here, is that I'm accustomed to a different mental model for vlans - to me, vlans are just tags on ports. A trunk port is multiple different tags on a port.

If you want to bridge between two ports you still need some conceptual glue saying "these tags on these ports should be bridged".

Perhaps we're better off not copying anything from physical interface to SVI.

That's the gist of this issue. Now we're in perfect agreement ;)

An exception could be made if there is a 1:1 mapping between physical port and an access vlan - then it's clear.

One thing I learned in the past few years: the fewer exceptions I allow, the cleaner the code ;) Also, things that work for one interface and fail when you add another interface might be frustrating for the end-user.

ipspace commented 1 year ago

Minimal implementation in 867856096b7668daf3f539357e32874a55154d4b. Unfortunately it's way too cumbersome to generate an error message when a lab topology uses attributes that should be specified on VLAN interface on PHY access/trunk interface.

jbemmel commented 1 year ago

For consistency, custom attributes defined on the physical interface should also be removed from the neighbor interfaces

interfaces:
- bridge_group: 2
  ifindex: 5
  ifname: svi.1
  name: VLAN core (1) -> [pe2]
  neighbors:
  - ifname: svi.1
    node: pe2
    service:    <-- custom attribute, removed from the physical interface/vlan but not the neighbor
    - id: 1
      name: epipe-1
      type: epipe
  type: svi
  virtual_interface: true
  vlan:
    mode: bridge

ipspace commented 1 year ago

For consistency, custom attributes defined on the physical interface should also be removed from the neighbor interfaces

Or maybe you should stop creating edge cases ;) I will not make the VLAN module even more complex than it already is. The documentation is pretty clear about what works and what does not, and that's it.

ipspace / netlab

VLAN versus physical interface attributes #868