ipspace / netlab

Making virtual networking labs suck less
https://netlab.tools
Other
433 stars 64 forks source link

[DOC] IOSXr #1035

Closed helpdeskdan closed 7 months ago

helpdeskdan commented 7 months ago

Document we need to fix

https://netlab.tools/labs/iosxr/

What's wrong

You have to do

export VIRTINSTALL_OSINFO_DISABLE_REQUIRE=1

Before you run:

netlab libvirt package iosxr

Or it will not run. I'm on 22.04.3

P.S. This netlab is a fantastic idea! Is there, maybe, a newbie forum where people can ask/give assistance? I can't get the config to stick in xr so vagrant fails when it tries to login. I'm still in the "trying" phase of that, not ready to file bug on that.

ipspace commented 7 months ago

That's a generic problem. Tons of devices use virt-install to create Vagrant boxes without specifying --os-variant or --osinfo.

Affected devices seem to be: Aruba CX, Cisco ASAv, Dell OS10, IOS XR, Mikrotik RouterOS7, Juniper vSRX. Other devices either don't have libvirt build recipes or use XML templates.

We could set the environment variable in netlab libvirt module and hope for the best, but something might break down the line, so it would be better to fix the build instructions. For the moment I'll fix the documentation 🤷‍♂️

ipspace commented 7 months ago

Adding @ssasso as you own several devices in the above list

ipspace commented 7 months ago

P.S. This netlab is a fantastic idea!

Thank you!

Is there, maybe, a newbie forum where people can ask/give assistance?

You can always open a discussion in this repo. There's also a Slack channel in network2code Slack team, but I don't know how active that is.

I can't get the config to stick in xr so vagrant fails when it tries to login. I'm still in the "trying" phase of that, not ready to file bug on that.

Weird. I don't remember having any issues along those lines, but then I only built IOS XR box once with whatever old version I managed to get, moved on, and never looked back (I have better things to do in my life than to wait for IOS XR to boot 🤦‍♂️).

Once a VM is started with Vagrant, you can get the VM name with virsh list and connect to the console with virsh console to investigate what's going on. Hope you'll figure it out.

helpdeskdan commented 7 months ago

Agreed - you have better things to do with your time than wait for ios-xr to boot! I will see what I can do and report back. Thank you for your time!

helpdeskdan commented 7 months ago

Is there a way to set config.vm.boot_timeout in Vagrantfile without it being written over?

ipspace commented 7 months ago

Yes, you can take the system iosxr-domain.j2, copy it into the current directory and modify it. It's briefly mentioned in https://netlab.tools/customize/, the details are in https://blog.ipspace.net/2022/06/netsim-custom-vagrant-boxes.html.

One of these days I have to add links to all those blog posts to netlab documentation.

helpdeskdan commented 7 months ago

Step 1. Move to a much better box. Much more Ram, Cpu, 22.04 - install netlab. Quick tests on cumulus - I have ospf neighbors - everything good! But, I need to work on... (sigh)... IOXR. This is where things go south.

WARNING                                                                        
--os-variant/--osinfo OS name is required, but no value was                                                                                                   
set or detected.                                                                                                                                              

This is now a fatal error. Specifying an OS name is required
for modern, performant, and secure virtual machine defaults.                   

You can see a full list of possible OS name values with:                                                                                                      

   virt-install --osinfo list                      

If your Linux distro is not listed, try one of generic values
such as: linux2022, linux2020, linux2018, linux2016

If you just need to get the old behavior back, you can use:                                                                                                   

  --osinfo detect=on,require=off                                                                                                                              

Or export VIRTINSTALL_OSINFO_DISABLE_REQUIRE=1       

WARNING  VIRTINSTALL_OSINFO_DISABLE_REQUIRE set. Skipping fatal error.
WARNING  Using --osinfo generic, VM performance may suffer. Specify an accurate OS for optimal results.

Starting install...                      
ERROR    Network not found: no network with matching name 'vagrant-libvirt'
Domain installation does not appear to have been successful.
If it was, you can restart your domain by running:                
  virsh --connect qemu:///system start vm_box                     
otherwise, please restart your installation.
Error executing virt-install --connect=qemu:///system --network network=vagrant-libvirt,model=e1000 --name=vm_box --cpu host --arch=x86_64 --vcpus=2 --ram=8192 --virt-type=kvm --disk path=vm.qcow2,format=qcow2,device=disk,bus=ide --graphics none --import:
  Command '['virt-install', '--connect=qemu:///system', '--network', 'network=vagrant-libvirt,model=e1000', '--name=vm_box', '--cpu', 'host', '--arch=x86_64', '--vcpus=2', '--ram=8192', '--virt-type=kvm', '--disk', 'path=vm.qcow2,format=qcow2,device=disk,bus=ide', '--graphics', 'none', '--import']' returned non-zero
 exit status 1.                                   
[FATAL]   Aborting    
ipspace commented 7 months ago

Looks like 'netlab libvirt' did not create the 'vagrant-libvirt' network. That's weird, have to look into the source code.

helpdeskdan commented 7 months ago

Quite perplexed - I did not have this problem on my ancient desktop. (I had different problems, but not that one) I'd be happy to test anything I can.

ipspace commented 7 months ago

OK, I checked the source code and netlab libvirt package definitely tries to create the vagrant-libvirt network. Admittedly, those commands are not error-checked (have to fix that).

Anyway, just to be on the safe side, I created a brand-new Ubuntu 22.04 VM, ran netlab install ubuntu libvirt on it and started creating an IOS XR box. Apart from the horrible "we have no idea what OS you're using" error the VM started, but then I killed the process.

It could be a permission problem. Did you use netlab install to install the virtualization software (kvm, libvirt, vagrant, vagrant plugin), or did you install it by yourself? If you used netlab install, did you logout after the installation (the group membership is evaluated only during the login procedure)?

helpdeskdan commented 7 months ago
uid=1000(dans) gid=1000(dans) groups=1000(dans),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),122(lpadmin),133(lxd),134(sambashare),137(libvirt),999(docker)

That looks right, and I created a dir in tmp.... I've been running cumulus and frr just fine. Perhaps it is something specific to this box, but I am afraid I don't understand vagrant well enough to troubleshoot.

ipspace commented 7 months ago

I'm slowly running out of ideas (as in "so far I was throwing spaghetti at the wall to see if anything sticks, but now I'm running out of pasta"). Please run netlab libvirt package -vvv and post the full printout.

Also, it's not a Vagrant problem, it's a libvirt one. Vagrant is not involved until the disk image is modified with the startup configuration.

FWIW, did you use netlab install ubuntu libvirt to set things up?

helpdeskdan commented 7 months ago

Yes, many apologies, complete newbie to all this. Even I should have realized vagrant is not at all libvirt.

I did use netlab install ubuntu ansible libvirt containerlab. However, and I am sorry for not mentioning this sooner, but I am doing this in miniconda. I didn't mention that because it should work - it worked last time I did it that way.....

I suppose I should learn how to do this manually and see why libvirt isn't working.

$ netlab libvirt package -vvv iosxr xrv9k-fullk9-x.vrr-7.11.1.qcow2

=================
     WARNING
=================
This is an experimental script that does its best to build a Vagrant box for libvirt
provider out of a VM disk. It might die a horrible death and leave all sorts of garbage
behind that you'll have to clean up by hand (for example, libvirt 'vm_box' virtual machine).

It also assumes that it can wreak havoc in the current directory (although it will do its
best not to damage the original virtual disk).

Do you want to continue? [y/n]y
error: failed to get domain 'vm_box'

error: failed to get domain 'vm_box'

creating libvirt management network vagrant-libvirt
Creating a copy of xrv9k-fullk9-x.vrr-7.11.1.qcow2

====================
Starting the VM
====================
We'll start the VM from the newly-created virtual disk. When the
VM starts, execute 'netlab libvirt config iosxr' in another
window and follow the instructions.
====================

ERROR    
--os-variant/--osinfo OS name is required, but no value was
set or detected.

This is now a fatal error. Specifying an OS name is required
for modern, performant, and secure virtual machine defaults.

You can see a full list of possible OS name values with:

   virt-install --osinfo list

If your Linux distro is not listed, try one of generic values
such as: linux2022, linux2020, linux2018, linux2016

If you just need to get the old behavior back, you can use:

  --osinfo detect=on,require=off

Or export VIRTINSTALL_OSINFO_DISABLE_REQUIRE=1

Error executing virt-install --connect=qemu:///system --network network=vagrant-libvirt,model=e1000 --name=vm_box --cpu host --arch=x86_64 --vcpus=2 --ram=8192 --virt-type=kvm --disk path=vm.qcow2,format=qcow2,device=disk,bus=ide --graphics none --import:
  Command '['virt-install', '--connect=qemu:///system', '--network', 'network=vagrant-libvirt,model=e1000', '--name=vm_box', '--cpu', 'host', '--arch=x86_64', '--vcpus=2', '--ram=8192', '--virt-type=kvm', '--disk', 'path=vm.qcow2,format=qcow2,device=disk,bus=ide', '--graphics', 'none', '--import']' returned non-zero exit status 1.
[FATAL]   Aborting
ipspace commented 7 months ago

OK, I was hoping to get more debugging printouts :( Will fix the code to generate them; you'll have to clone the repo and run netlab from there.

The printout does indicate that the code to create the management network is executed, we just don't know what's going on inside it (and that's why I need those extra printouts). However, the management network is not there when netlab executes 'virt-install', which is totally weird, because based on "Cumulus Linux works" (assuming you're running it in VMs, not containers), obviously netlab successfully creates management network before starting Vagrant.

We must be doing something that something in your setup dislikes, but I can't figure out what it might be. Will write another comment to notify you once I have the debugging printouts in place.

ipspace commented 7 months ago

Grasping at straws ;) -- can you do virsh net-list --all after netlab libvirt package fails? Because it fails it doesn't do a cleanup, so we should see the virtual networks.

ipspace commented 7 months ago

Oh, another potential gotcha that might explain the difference between netlab up and netlab libvirt package. Do export LIBVIRT_DEFAULT_URI=qemu:///system and retry.

ipspace commented 7 months ago

@helpdeskdan did the LIBVIRT_DEFAULT_URI environment variable help? Would love to add it to netlab libvirt in the next day or two, but it would be nice to know before that if it solved your problem or not.

helpdeskdan commented 7 months ago

Apologies, I have been ill.

$ export LIBVIRT_DEFAULT_URI=qemu:///system                        
(netlab) ╭─dans@TheReplacement /tmp/cisco 
╰─$ netlab libvirt package -vvv iosxr xrv9k-fullk9-x.vrr-7.11.1.qcow2

=================
     WARNING
=================
This is an experimental script that does its best to build a Vagrant box for libvirt
provider out of a VM disk. It might die a horrible death and leave all sorts of garbage
behind that you'll have to clean up by hand (for example, libvirt 'vm_box' virtual machine).

It also assumes that it can wreak havoc in the current directory (although it will do its
best not to damage the original virtual disk).

Do you want to continue? [y/n]y
error: failed to get domain 'vm_box'

error: failed to get domain 'vm_box'

creating libvirt management network vagrant-libvirt
Creating a copy of xrv9k-fullk9-x.vrr-7.11.1.qcow2

====================
Starting the VM
====================
We'll start the VM from the newly-created virtual disk. When the
VM starts, execute 'netlab libvirt config iosxr' in another
window and follow the instructions.
====================

ERROR    
--os-variant/--osinfo OS name is required, but no value was
set or detected.

This is now a fatal error. Specifying an OS name is required
for modern, performant, and secure virtual machine defaults.

You can see a full list of possible OS name values with:

   virt-install --osinfo list

If your Linux distro is not listed, try one of generic values
such as: linux2022, linux2020, linux2018, linux2016

If you just need to get the old behavior back, you can use:

  --osinfo detect=on,require=off

Or export VIRTINSTALL_OSINFO_DISABLE_REQUIRE=1

Error executing virt-install --connect=qemu:///system --network network=vagrant-libvirt,model=e1000 --name=vm_box --cpu host --arch=x86_64 --vcpus=2 --ram=8192 --virt-type=kvm --disk path=vm.qcow2,format=qcow2,device=disk,bus=ide --graphics none --import:
  Command '['virt-install', '--connect=qemu:///system', '--network', 'network=vagrant-libvirt,model=e1000', '--name=vm_box', '--cpu', 'host', '--arch=x86_64', '--vcpus=2', '--ram=8192', '--virt-type=kvm', '--disk', 'path=vm.qcow2,format=qcow2,device=disk,bus=ide', '--graphics', 'none', '--import']' returned non-zero exit status 1.
[FATAL]   Aborting
(netlab) ╭─dans@TheReplacement /tmp/cisco 
╰─$ virsh net-list --all                                                                                                                                                                                                                                                                                                 1 ↵
 Name              State    Autostart   Persistent
----------------------------------------------------
 default           active   yes         yes
 vagrant-libvirt   active   no          yes

Yet, it works:

$ cat topology.yml                                      
defaults:                                                                     
  device: cumulus                 

module: [ ospf ]                  

nodes: [ s1, s2, s3 ]                                                         
links: [ s1-s2, s2-s3, s1-s2-s3 ] 
(netlab) ╭─dans@TheReplacement ~/test 
╰─$ netlab up      

yada yada... netlab connect s1

s1# show ip ospf neighbor 

Neighbor ID     Pri State           Dead Time Address         Interface                        RXmtL RqstL DBsmL
10.0.0.2          1 Full/DROther      36.790s 10.1.0.2        swp1:10.1.0.1                        0     0     0
10.0.0.2          1 Full/Backup       36.790s 172.16.0.2      swp2:172.16.0.1                      0     0     0
10.0.0.3          1 Full/DR           36.766s 172.16.0.3      swp2:172.16.0.1                      1     0     0
ipspace commented 7 months ago

Apologies, I have been ill.

So sorry to hear that :( Hope you're getting better and I apologize for bother you!

ipspace commented 7 months ago

I tried to find something that would work on Ubuntu 20.04 and 22.04. Unfortunately, the virt-install used in Ubuntu 20.04 does not accept --osinfo parameter at all, while the 22.04 version made it (almost) mandatory.

Obviously we could create the virtual machines from XML templates (like we're doing for Arista vEOS or Cisco IOSv) but I'm not going to waste my time going down that path for IOS XR. I'll just keep hoping the current workaround does not result in too dismal performance.

Note to anyone who might read this in the future: please feel free to submit a pull request containing the XML VM definition template for any device that still uses virt-install to create the box-building VM.