ipspace / netlab

Making virtual networking labs suck less
https://netlab.tools
Other
403 stars 56 forks source link

Warning: don't use jammy cloud 2204 no VRF #1209

Closed benbgg closed 1 month ago

benbgg commented 1 month ago

Just finished getting an (apparently viable) jammy kvm cloud image going on Proxmox.... lots of learning about cloudinit .... and then while running "netlab test clab" ...

sharing so no one else has to go through this....

benbgg commented 1 month ago

Image name: for above: jammy-server-cloudimg-amd64-disk-kvm.img

ssasso commented 1 month ago

What is lsmod | grep vrf returning? Can you please try to manually load the module with modprobe vrf?

ssasso commented 1 month ago

Maybe I misunderstood your issue.

Are you saying that images from https://cloud-images.ubuntu.com/jammy/current/ are shipped with a reduced kernel with no vrf support?

In that case it should be easy to download and run a "full" kernel from the ubuntu repos.

jbemmel commented 1 month ago

Could it be that we are missing a modprobe vrf since we changed the management vrf on frr? Similar to https://github.com/ipspace/netlab/blob/dev/netsim/ansible/tasks/frr/mpls-clab.yml#L3

benbgg commented 1 month ago

1) yes, no VRF support in that cloud image. (ie lsmod | grep vrf gives nothing, mod probe comes back "not found")

2) OK, done a bit of reading on Ubuntu forums and it appears that the cloud images ship with kernels optimised for different clouds/ platforms. Canonical appears to have decided that VRFs was superfluous for this use case.

I tried to install linux-5.15.105 generic and its modules but that seems to have made the machine go into a boot loop. Should have snapshotted :-(

jbemmel commented 1 month ago

The use of a vrf is conditional on netlab_mgmt_vrf|default(False) https://github.com/ipspace/netlab/blob/dev/netsim/ansible/templates/initial/frr.j2#L52

Is netlab_mgmt_vrf set in your environment? Probably yes - see https://github.com/ipspace/netlab/blob/dev/netsim/devices/frr.yml#L17

A fix would be to try and load the vrf kernel module, and override netlab_mgmt_vrf`=False if it cannot be loaded

benbgg commented 1 month ago

Couldn't say what was set or unset - I'd just completed install and was running netlab test clab as described in docs. I managed to blow up the vm by trying to move doing an apt install of the generic kernel (see above). Cloud images not proving a very happy experience., but I'll have another crack seeding from a generic image if anyone wants me to. I'm making more headway with Debian 12 bookworm once I figured out overriding external-package block, so for the moment I'll stick to that. Thanks for your help/ suggestions

ipspace commented 1 month ago

Adding management VRF for FRR seemed like a no-brainer (aka "the road to hell..."). Wrong assumptions :( To turn it off, add:

devices.frr.clab.group_vars.netlab_mgmt_vrf: False

to ~/.netlab.yml. netlab test clab should work after that.

@jbemmel already has a proof-of-concept fix, and we'll have it in the dev branch in a day or two (and in the next netlab release in a few weeks).

ipspace commented 1 month ago

I tried to install linux-5.15.105 generic and its modules

I would usually try to install linux-generic, that includes the correct version of the package.

but that seems to have made the machine go into a boot loop. Should have snapshotted :-(

Ouch :(