ipspace / netlab

Making virtual networking labs suck less
https://netlab.tools
Other
406 stars 58 forks source link

Addition of Cisco 1000v and 9300v with Containerlab Support #1168

Closed fluffytrlolz closed 1 week ago

fluffytrlolz commented 2 months ago

I'm looking for support for the Cisco 9000v (9300v) and 1000v with the containerlab platform. I've tested it out, and it looks like it may be close, less the interface names that netlab sets up and passes into the clab.yml file. As an example :

interfaces:
- ifindex: 1
  ifname: Ethernet1/1
  ipv4: 10.1.0.1/30
  linkindex: 1
  name: r1 -> r2
  neighbors:

If the ifname was changed to "eth1", this would align with containerlab topology file and I believe the remaining pieces would not require change.

I'm happy to help with testing out any implementations to assist in this update.

Thanks!

ipspace commented 2 months ago

CSR1000v should be a simple case of interface name mapping. The interfaces will have one name within the container, and another name within the virtual machine inside the container (I'm assuming you're doing this stuff: https://containerlab.dev/manual/kinds/vr-csr/)

To do that, you have to define a bunch of stuff under devices.csr.clab. The easiest way to start would be to define them in the topology file:

defaults:
  devices.csr.clab:
    image: <docker-image>
    node.kind: cisco_csr1000v
    interface.name: eth{ifindex+1}

I'm guessing the interface.name bit based on vMX definition.

Next, you'll have to write an 'are we ready' task list because the container starts "immediately" while the VM within the container takes "forever". See https://github.com/ipspace/netlab/blob/dev/netsim/ansible/tasks/readiness-check/vptx.yml for an example).

Alternatively, if you could somehow get the two Docker images over to me, I will try to figure it all out ;)

fluffytrlolz commented 1 month ago

Apologies for taking so long to come back to this topic. I did the following updates to my topology.yml file:

defaults:
  device: nxos
  devices.nxos:
    clab.image: vrnetlab/vr-n9300v:9.3.6
    clab.node.kind: cisco_n9kv
    clab.interface.name: eth{ifindex}
  provider: clab
module: [ospf]

nodes:
  r1:
  r2:
links: [r1-r2]

I then went into the netsim/ansible/tasks/readiness-check/ and added a nxos.yml that contained a simple wait 15min:

- name: Wait for at least 15 minutes for 9000v inside CLAB...
  pause:
    minutes: 15
  when: |
    netlab_provider == 'clab'

This seemed to create the proper naming structure for the clab.yml:

interfaces:                                                                                 
- clab:                                                                                     
    name: eth1                                                                              
  ifindex: 1                                                                                
  ifname: Ethernet1/1                                                                       
  ipv4: 10.1.0.1/30                                                                         
  linkindex: 1                                                                              
  name: r1 -> r2                                                                            
  neighbors:                                                                                
  - ifname: Ethernet1/1                                                                     
    ipv4: 10.1.0.2/30                                                                       
    node: r2                                                                                
  ospf:                                                                                     
    area: 0.0.0.0                                                                           
    network_type: point-to-point                                                            
    passive: false                                                                          
  type: p2p                

I set everything up this way since I started toying around with the 9000v, and figured the NXOS model would match the closet as it runs NXOS. I am running into issues when creating the lab where the group_vars/nxos/topology.yml file created is using a different password than the 9000v defaults (admin/admin):

  # Ansible inventory created from ['/home/clab-user/netlab/cisco-test/topology.yml', 'package:topology-defaults.yml']
#

ansible_connection: network_cli
ansible_network_os: nxos
ansible_ssh_pass: vagrant
ansible_user: vagrant

I can get past this by manually changing those and then firing up the lab. However, should I create a different device type for the 9000v that sets the proper default password or is the better approach to override the default value for this? (I just haven't quite figured out the appropriate syntax to get that ansible_ssh_pass and ansible_user overridden)

ipspace commented 1 month ago

So glad to hear you got this far, although I'd prefer a more robust readiness check, maybe something along the lines of what @ssasso did for vMX: https://github.com/ipspace/netlab/blob/dev/netsim/ansible/tasks/vmx/initial.yml#L8 (it should be moved into readiness_check but that's a different story).

Ansible variables are easy. Just set devices.nxos.clab.group_vars to whatever values you need. See https://github.com/ipspace/netlab/blob/dev/netsim/devices/eos.yml#L85 for an example.

fluffytrlolz commented 1 month ago

I was able to get the 9000v working with the following updates :

topology.yml:

  device: nxos
  devices.nxos.clab:
    group_vars.ansible_ssh_pass: admin
    group_vars.ansible_user: admin
    image: vrnetlab/vr-n9300v:9.3.6
    node.kind: cisco_n9kv
    interface.name: eth{ifindex}
  provider: clab
module: [ospf]

nodes:
  r1:
  r2:
links: [r1-r2]

nxos.yml added to the readiness-check tasks :

- name: Execute local ssh command to check 9000v readiness
  local_action:
    module: shell
    cmd: >
      sshpass -p '{{ ansible_ssh_pass }}' ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null {{ ansible_user }}@{{ ansible_host }} 'show int eth1/1'
  register: command_out
  until: command_out.rc == 0
  retries: 40
  delay: 30
  when: clab.kind is defined

- name: Confirm readiness of each host
  debug:
    msg: "Host {{ hostvars[inventory_hostname].hostname }} is ready."
  when: command_out.rc == 0

I'll look at the 1000v next and see what changes are required to make those come up. Let me know if you have any other ideas on how I should clean this up, or if this is how I should proceed. Would it be possible to integrate the nxos.yml into the main? I wondered if the other nxos VMs had similar issues to what I encountered or if it's just how I am implementing in containerlab.

Thanks!

ipspace commented 1 month ago

I was able to get the 9000v working with the following updates:

Gee, if only you were a day faster, they would have been in 1.8.3 ;)

Let me know if you have any other ideas on how I should clean this up, or if this is how I should proceed.

This looks pretty decent to me. Not much I would change. I would probably repackage your code into a more generic "vm-in-container" task list and include it in nxos-clab.yml.

Would it be possible to integrate the nxos.yml into the main?

Of course. I'll add it, noting where it came from.

Would it be possible for you to test the solution once I do the packaging?

I wondered if the other nxos VMs had similar issues to what I encountered or if it's just how I am implementing in containerlab.

NXOS has a generic problem that it claims it's ready before its interfaces are ready (Junos on vPTX seems to have a similar problem). We're dealing with that in the NXOS config deployment task list, and I planned to move that to the readyness check for a long time. Now I have a good reason to get it done ;)

What you're experiencing though is specific to the way the VM is packaged in a container with an SSH proxy sitting in front of it.

fluffytrlolz commented 1 month ago

Would it be possible for you to test the solution once I do the packaging?

Yep, I can test the updates when included. I will try to test the 1000v this evening and see if it is just an interface naming convention or if a readiness check is also required. With any luck, I'll have the 1000v and 9000v tested in time for 1.8.4 :)

ipspace commented 1 month ago

So I tried the hellt/vrnetlab project and nxos keeps crashing. I will not waste any more time trying to troubleshoot that.

Anyway, I copied your settings (apart from the image name) into nxos.yml, moved "Ethernet 1/1" readyness check into nxos-specific task list, added a generic "test if the VM in a container is ready" test, and nxos-clab.yml task list that just invokes the other two. The results are in the nxos-clab branch (changes in https://github.com/ipspace/netlab/compare/dev...nxos-clab)

There is pretty high probability that this should work, but I can't be 100% sure ;) Anyway, pull down the latest changes, switch to nxos-clab branch and give it a try. Keeping my fingers crossed ;))

As for CSR 1Kv, you'll have to use the same readyness check (see comments in https://github.com/ipspace/netlab/blob/dev/netsim/ansible/tasks/vmx/initial.yml for details). Copy the nxos-clab.yml into csr-clab.yml, and remove the "check for Ethernet 1/1" include_tasks

ipspace commented 1 month ago

So I tried the hellt/vrnetlab project and nxos keeps crashing. I will not waste any more time trying to troubleshoot that.

I'm an idiot. I tried to build a nxos container, not n9kv one. It all works now.

I changed the image name to what hellt/vrnetlab generates and reduced the retries to 20 (my setup worked after three retries, as each retry takes 30 seconds to time out).

ipspace commented 1 month ago

FWIW, I added the CSR part. It should work once you get the container up and running (it didn't work for me out of the box and I didn't have time to troubleshoot it)

fluffytrlolz commented 1 month ago

I'll test it out today, thanks ! Interesting how quick your 9000v spun up. Mine definitely takes the 12 minutes , I'll have to double check the specs I gave the VM I've been running everything on. I'll also take a look at which rev I have, maybe a contributing factor.

ipspace commented 1 month ago

Found a CSR quirk (fixed), but still can't get it to run. For whatever weird reason, my CSR container thinks its interfaces start at GigabitEthernet2 and then everything falls apart.

ipspace commented 1 month ago

Update: got CSR 1Kv to run. The MAC address of the management interface must not change (have to submit a patch to @hellt).

I also made the check readiness parameters configurable. Have to document that and clean the documentation, then we're good to merge.

fluffytrlolz commented 1 month ago

I was able to do the initial testing this evening but I will need to complete it by doing a little more digging tomorrow. Below is a summary of my findings :

CSR9000v

topology.yml

I commented out the original topology file for lines that netlab appeared to handle reviewing your changes on the branch.

defaults:
  device: nxos
  devices.nxos.clab:
#    group_vars.ansible_ssh_pass: admin
#    group_vars.ansible_user: admin
    image: vrnetlab/vr-n9300v:9.3.6
#    node.kind: cisco_n9kv
#   interface.name: eth{ifindex}
  provider: clab
module: [ospf]

nodes:
  r1:
  r2:
links: [r1-r2]

netlab create

I ran a netlab create an inspected the output files.

clab.yml

It looks like the endpoints are still grabbing the "Ethernet1/1" even though I saw your updates to the nxos.yml devices to change clabs "interface.name" to "eth{ifindex}"

 name: netlab

mgmt:
  network: netlab_mgmt
  ipv4-subnet: 192.168.121.0/24
  # Note: 'start' not validated
topology:
  nodes:
    r1:
      mgmt-ipv4: 192.168.121.101
      kind: nxos
      image: vrnetlab/vr-n9300v:9.3.6
      runtime: docker
    r2:
      mgmt-ipv4: 192.168.121.102
      kind: nxos
      image: vrnetlab/vr-n9300v:9.3.6
      runtime: docker

  links:
  - endpoints:
    - "r1:Ethernet1/1"
    - "r2:Ethernet1/1"

group_vars/nxos/topology.yml

The ansible_user and ansible_pass appear to still populate with the vagrant key. I also saw the updates made to these key values within the nxos.yml file as well.

# Ansible inventory created from ['/home/clab-user/netlab-cisco-test/netlab/topology.yml', 'package:topology-defaults.yml']
#

ansible_connection: network_cli
ansible_network_os: nxos
ansible_ssh_pass: vagrant
ansible_user: vagrant

CSR1000v

I observed similar results on this as noted on the 9000v. I've included the output for topology.yml, clab.yml and topology.yml (from group_vars)

topology.yml

defaults:
  device: csr
  devices.csr.clab:
#    group_vars.ansible_ssh_pass: admin
#    group_vars.ansible_user: admin
    image: vrnetlab/vr-csr:17.03.02
#    node.kind: cisco_n9kv
#   interface.name: eth{ifindex}
  provider: clab
module: [ospf]

nodes:
  r1:
  r2:
links: [r1-r2]

clab.yml

name: netlab

mgmt:
  network: netlab_mgmt
  ipv4-subnet: 192.168.121.0/24
  # Note: 'start' not validated
topology:
  nodes:
    r1:
      mgmt-ipv4: 192.168.121.101
      kind: csr
      image: vrnetlab/vr-csr:17.03.02
      runtime: docker
    r2:
      mgmt-ipv4: 192.168.121.102
      kind: csr
      image: vrnetlab/vr-csr:17.03.02
      runtime: docker

  links:
  - endpoints:
    - "r1:GigabitEthernet2"
    - "r2:GigabitEthernet2"

group_vars/csr/topology.yml

# Ansible inventory created from ['/home/clab-user/netlab-cisco-test/netlab/topology.yml', 'package:topology-defaults.yml']
#

ansible_become_method: enable
ansible_become_password: vagrant
ansible_connection: network_cli
ansible_network_os: ios
ansible_ssh_pass: vagrant
ansible_user: vagrant
netlab_device_type: csr
netlab_initial: always
ssasso commented 1 month ago

Sorry to jump in ;)

any plan to try to test also C8000v (successor of CSR1kv) as part of this (should be configurable with the same csr templates)? (otherwise I will try in on my own in a couple of weeks, if I will be able to fetch the image)

ref: https://containerlab.dev/manual/kinds/vr-c8000v/

ipspace commented 1 month ago

@fluffytrlolz: Thanks a million for an extensive test report. You probably forgot to switch to the nxos-clab branch. No worries, I'll merge that branch with the dev branch later today.

@ssasso: No plans. I decided I won't beg around for images ;) and if a vendor decides not to make an image available, then I won't waste my time on it. There are plenty of other platforms we can invest our efforts in.

fluffytrlolz commented 1 month ago

@ipspace , I cloned the nxos-clab branch and ran all testing previously mentioned out of that git clone (i double checked the devices nxos.yml before I proceeded):


clab-user@clab:~/netlab-cisco-test/netlab$ git branch 
* nxos-clab
clab-user@clab:~/netlab-cisco-test/netlab$ ls
ansible.cfg  clean-netlab.sh  examples    hosts.yml  legacy      MANIFEST.in  netlab               netsim     requirements-dev.txt  setup.py  tests         topology.yml.9000v
clab.yml     docs             group_vars  host_vars  LICENSE.md  mypy.ini     netlab.snapshot.yml  README.md  requirements.txt      setup.sh  topology.yml
clab-user@clab:~/netlab-cisco-test/netlab/netsim/devices$ cat nxos.yml 
---
description: Cisco Nexus 9300v
interface_name: Ethernet1/{ifindex}
mgmt_if: mgmt0
loopback_interface_name: loopback{ifindex}
virtualbox:
  image: cisco/nexus9300v
clab:
  group_vars:
    ansible_ssh_pass: admin
    ansible_user: admin
  image: vrnetlab/vr-n9kv:9.3.8
  node:
    kind: cisco_n9kv
  interface.name: eth{ifindex}
group_vars:
  ansible_user: vagrant
  ansible_ssh_pass: vagrant
  ansible_network_os: nxos
  ansible_connection: network_cli
bfd:           # NXOS requires lower default timer values
  min_rx: 500
evpn._start_transit_vlan: 3800
features:
  initial:
    ipv4:
      unnumbered: true
    ipv6:
      lla: true
  bfd: true
  bgp: true
  eigrp: true
  evpn:
    irb: true
  gateway:
    protocol: [ vrrp ]
  isis:
    unnumbered:
      ipv4: true
      ipv6: true
  ospf:
    unnumbered: true
  vlan:
    model: l3-switch
    native_routed: true
    subif_name: '{ifname}.{subif_index}'
    svi_interface_name: vlan{vlan}
  vrf:
    ospfv2: True
    bgp: True
  vxlan: true

libvirt:
  create_template: nxos.xml.j2
  image: cisco/nexus9300v
  build: https://netlab.tools/labs/nxos/
external:
  image: none
graphite.icon: nexus5000
ipspace commented 4 weeks ago

@ipspace , I cloned the nxos-clab branch and ran all testing previously mentioned out of that git clone (i double checked the devices nxos.yml before I proceeded):

Is it possible that you have netlab installed in system- or user path and that you're running that version of netlab (because it takes the system defaults from where the script was executed from)? which netlab should tell you if that's the case. You can execute . setup.sh in the repository's top directory to modify the PATH.

Anyway, I merged the nxos-clab branch with the dev branch, so you can use that one.

ipspace commented 1 week ago

Integrated in 1.8.4