ipspace / netlab

Making virtual networking labs suck less
https://netlab.tools
Other
456 stars 69 forks source link

Refactor code that avoids link.prefix assignment #1513

Closed jbemmel closed 1 week ago

jbemmel commented 2 weeks ago

Original PR was triggered by the lag module using prefix: False to prevent IP assignment to lag member links, triggering an error in subsequent code that assumed prefix was a dict

It turned out that this was the only point in the code where this was done like that - the vlan module uses prefix: {} for the same effect. The benefit of the latter, is that subsequent code testing for things like 'ipv4' in prefix doesn't error out

This PR:

  1. Normalizes the "do not assign a prefix" logic to prefix: {}
  2. Removes the empty prefix from the final topology (where it wasn't being used to begin with)
jbemmel commented 2 weeks ago

You should just swap the two terms in the original condition.

I thought link.prefix.ipv4 assumes link.prefix is a dict? What if link.prefix == True, wouldn't link.prefix.ipv4 still return false?

ipspace commented 1 week ago

You should just swap the two terms in the original condition.

I thought link.prefix.ipv4 assumes link.prefix is a dict? What if link.prefix == True, wouldn't link.prefix.ipv4 still return false?

OK, it looks like I can't explain things in plain English, so I'll try to demonstrate what I had in mind, but for that we have to start with a topology that causes a problem.

jbemmel commented 1 week ago

You should just swap the two terms in the original condition.

I thought link.prefix.ipv4 assumes link.prefix is a dict? What if link.prefix == True, wouldn't link.prefix.ipv4 still return false?

OK, it looks like I can't explain things in plain English, so I'll try to demonstrate what I had in mind, but for that we have to start with a topology that causes a problem.

---

defaults.device: linux

nodes:
  n1:
  n2:

links:
- n1:
  n2:
  prefix: False

Result:

Traceback (most recent call last):
  File "/home/jeroen/Projects/netlab/netlab", line 14, in <module>
    netsim.cli.lab_commands(__file__)
  File "/home/jeroen/Projects/netlab/netsim/cli/__init__.py", line 351, in lab_commands
    mod.run(sys.argv[arg_start:])   # type: ignore
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jeroen/Projects/netlab/netsim/cli/create.py", line 81, in run
    augment.main.transform(topology)
  File "/home/jeroen/Projects/netlab/netsim/augment/main.py", line 109, in transform
    transform_data(topology)
  File "/home/jeroen/Projects/netlab/netsim/augment/main.py", line 82, in transform_data
    augment.links.transform(topology.links,topology.defaults,topology.nodes,topology.pools)
  File "/home/jeroen/Projects/netlab/netsim/augment/links.py", line 1185, in transform
    set_default_gateway(link,nodes)
  File "/home/jeroen/Projects/netlab/netsim/augment/links.py", line 993, in set_default_gateway
    if not 'ipv4' in link.prefix or isinstance(link.prefix.ipv4,bool):
           ^^^^^^^^^^^^^^^^^^^^^
TypeError: argument of type 'bool' is not iterable

Encountered while working on adding lag support for linux devices

ipspace commented 1 week ago

Thanks for the sample topology. You were right, swapping the terms does not help (I missed the "link.prefix.ipv4" part). However, the rest of the PR makes less sense. What we need is prefix normalizing logic very early in the link processing, and then we wouldn't care what other modules do.

Will try to find the minimum set of changes that would get the job done.

ipspace commented 1 week ago

I thought that I would be able to clean up the code in "assign_link_prefix" but it turns out that it still has to do most everything it does because the data type validation does not replace values in attributes when matching on alt-types. The results are in bfd2fc2d766d1c1337298629d51c365fa5062c0a

Anyway, it's probably not a good idea to just replace "False" with "{}", at least since https://github.com/ipspace/netlab/commit/6349c8c3ae285babc86b77002cbb2cbf9f0cd8c0 which makes "False" mean "I really don't want a prefix" as opposed to an empty dict meaning "OK, I could accept some prefixes, but I would also like to add this (empty) set of parameters"

Finally, while none of us can figure out how removing meaningless (to us) "prefix" at the end of link transformation could do any damage, we have no idea what people creatively using netlab depend on, so we can't know whether changing things just because we feel like changing them might impact someone. I was on the receiving end of "let's change stuff because we feel like that" mentality of the Ansible team and it wasn't a pleasant experience.

ipspace commented 1 week ago

Totally off-topic: if you use "prefix: false" or "prefix: {}" with Linux non-LAG bonding, you might be doing it wrong. From the network perspective, the bonded uplinks look like regular access interfaces that happen to have the same IP address (and might have a different MAC address). Oh, and they must usually be connected to an access VLAN so the IP address can move from one link to the other.

jbemmel commented 1 week ago

Totally off-topic: if you use "prefix: false" or "prefix: {}" with Linux non-LAG bonding, you might be doing it wrong. From the network perspective, the bonded uplinks look like regular access interfaces that happen to have the same IP address (and might have a different MAC address). Oh, and they must usually be connected to an access VLAN so the IP address can move from one link to the other.

In the Linux non-LAG bonding case I'm thinking to add a virtual interface (on the same link) that "owns" the IP address and any other L3 properties. The physical interfaces get prefix: False. From the network perspective it may look like a regular access interface, but the internal model is slightly different.

ipspace commented 1 week ago

In the Linux non-LAG bonding case I'm thinking to add a virtual interface (on the same link) that "owns" the IP address and any other L3 properties. The physical interfaces get prefix: False. From the network perspective it may look like a regular access interface, but the internal model is slightly different.

I would use a similar approach to how we're creating the SVI interfaces -- copy all attributes from the first slave interface, and remove almost all non-phy attributes from all slaves (or whatever the proper terminology happens to be these days).