ipspace / netlab

Making virtual networking labs suck less
https://netlab.tools
Other
439 stars 66 forks source link

Getting started topology doesn't come up #207

Closed kbreit closed 2 years ago

kbreit commented 2 years ago

I am creating a topology on libvirt in Ubuntu 20.04 (VM in vSphere) but one of the routers fails to come up.

Output:

root@e7ubnt0nstools01:/home/ubuntu/topologies/test# netlab up
Created provider configuration file: Vagrantfile
Created group_vars for all
Created group_vars for eos
Created host_vars for r1
Created host_vars for r2
Created minimized Ansible inventory hosts.yml
Created Ansible configuration file: ansible.cfg

Step 2: Checking virtualization provider installation
============================================================
.. all tests succeeded, moving on

Step 3: starting the lab
============================================================
Bringing machine 'r1' up with 'libvirt' provider...
Bringing machine 'r2' up with 'libvirt' provider...
==> r1: Uploading base box image as volume into Libvirt storage...
==> r1: Creating image (snapshot of base box volume).
==> r2: Creating image (snapshot of base box volume).
==> r1: Creating domain with the following settings...
==> r2: Creating domain with the following settings...
==> r1:  -- Name:              test_r1
==> r2:  -- Name:              test_r2
==> r1:  -- Domain type:       kvm
==> r2:  -- Domain type:       kvm
==> r2:  -- Cpus:              2
==> r2:  -- Feature:           acpi
==> r1:  -- Cpus:              2
==> r2:  -- Feature:           apic
==> r2:  -- Feature:           pae
==> r1:  -- Feature:           acpi
==> r2:  -- Clock offset:      utc
==> r2:  -- Memory:            2048M
==> r1:  -- Feature:           apic
==> r2:  -- Management MAC:    08-4F-A9-00-00-02
==> r1:  -- Feature:           pae
==> r2:  -- Loader:
==> r1:  -- Clock offset:      utc
==> r1:  -- Memory:            2048M
==> r2:  -- Nvram:
==> r1:  -- Management MAC:    08-4F-A9-00-00-01
==> r2:  -- Base box:          arista/veos
==> r1:  -- Loader:
==> r2:  -- Storage pool:      default
==> r2:  -- Image:             /var/lib/libvirt/images/test_r2.img (5G)
==> r1:  -- Nvram:
==> r2:  -- Disk driver opts:  cache='default'
==> r2:  -- Kernel:
==> r1:  -- Base box:          arista/veos
==> r2:  -- Initrd:
==> r1:  -- Storage pool:      default
==> r2:  -- Graphics Type:     vnc
==> r2:  -- Graphics Port:     -1
==> r1:  -- Image:             /var/lib/libvirt/images/test_r1.img (5G)
==> r1:  -- Disk driver opts:  cache='default'
==> r1:  -- Kernel:
==> r2:  -- Graphics IP:       127.0.0.1
==> r1:  -- Initrd:
==> r1:  -- Graphics Type:     vnc
==> r2:  -- Graphics Password: Not defined
==> r1:  -- Graphics Port:     -1
==> r2:  -- Video Type:        cirrus
==> r2:  -- Video VRAM:        9216
==> r2:  -- Sound Type:
==> r1:  -- Graphics IP:       127.0.0.1
==> r2:  -- Keymap:            en-us
==> r2:  -- TPM Backend:       passthrough
==> r1:  -- Graphics Password: Not defined
==> r2:  -- TPM Path:
==> r2:  -- INPUT:             type=mouse, bus=ps2
==> r1:  -- Video Type:        cirrus
==> r1:  -- Video VRAM:        9216
==> r1:  -- Sound Type:
==> r1:  -- Keymap:            en-us
==> r1:  -- TPM Backend:       passthrough
==> r1:  -- TPM Path:
==> r1:  -- INPUT:             type=mouse, bus=ps2
==> r2: Creating shared folders metadata...
==> r1: Creating shared folders metadata...
==> r2: Starting domain.
==> r2: Waiting for domain to get an IP address...
==> r1: Starting domain.
==> r1: Waiting for domain to get an IP address...
==> r2: Waiting for SSH to become available...
==> r1: Waiting for SSH to become available...

It sat at this step for at least 30 minutes.

Topology file:

---
provider: libvirt
defaults:
  device: eos
module: [ ospf ]

nodes: [ r1, r2 ]
links:
- r1
- r2
- r1-r2
ipspace commented 2 years ago

And this is what happens when I try to automate a Jenga tower ;)

On a more serious note:

Finally, you can always connect to the virtual machines with virsh console vm-name (use virsh list to see VM names) and check what's going on.

kbreit commented 2 years ago
ipspace commented 2 years ago
  • We are running this on ESXi. I'm allocating 4 CPU, 8GB RAM, and 128GB of disk.

That's enough.

  • I downloaded 4.24.8M from the Arista website and did a mutate on it. This was based on the documentation but if there's a better way to import the Arista image, I'm happy to go that way.

https://netsim-tools.readthedocs.io/en/latest/labs/eos.html

Yeah, I have to fix that tutorial and bring it in sync with the installation guide.

  • vagrant plugin list shows No plugins installed. However, I did the install using netlab install ubuntu ansible libvirt so I'd have expected it to be populated.

OK, in that case you most probably have the correct version of the plugin. BTW, did you use sudo netlab install? Vagrant plugins are installed into user's home directory (as I learned the hard way when fixing the installation scripts).

  • I did login to the VMs and both are getting an IP address (192.168.121.101 and 102).

Did you try to SSH into those IP addresses?

kbreit commented 2 years ago

I recreated the image using the instructions and the netlab libvirt package command. Same results overall. I think I ran it as root and not using sudo. But I did run a vagrant plugin list and now it's showing.

root@e7ubnt0nstools01:/home/ubuntu# vagrant plugin list
vagrant-libvirt (0.4.1, global)
  - Version Constraint: 0.4.1
vagrant-mutate (1.2.0, global)
==> r2: Creating shared folders metadata...
==> r1:  -- Sound Type:
==> r1:  -- Keymap:            en-us
==> r2: Starting domain.
==> r1:  -- TPM Backend:       passthrough
==> r2: Waiting for domain to get an IP address...
==> r1:  -- TPM Path:
==> r2: Waiting for SSH to become available...
==> r1:  -- INPUT:             type=mouse, bus=ps2
==> r1: Creating shared folders metadata...
==> r1: Starting domain.
==> r1: Waiting for domain to get an IP address...
==> r1: Waiting for SSH to become available...
localhost#show ip int br
                                                                        Address
Interface       IP Address             Status     Protocol        MTU   Owner
--------------- ---------------------- ---------- ------------ -------- -------
Management1     192.168.121.101/24     up         up             1500
ubuntu@e7ubnt0nstools01:~$ ssh admin@192.168.121.101
The authenticity of host '192.168.121.101 (192.168.121.101)' can't be established.
ECDSA key fingerprint is SHA256:iL0RR7lihzfFH+svE0tKeDr+FY5CsclrtaXtqb57elw.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '192.168.121.101' (ECDSA) to the list of known hosts.
Password:
Last login: Thu Feb 17 16:32:14 2022
localhost#
ipspace commented 2 years ago

I recreated the image using the instructions and the netlab libvirt package command.

Thank you for trying that out.

Same results overall.

Dang. I ran out of ideas... apart from the usual "reboot and try again". Now I sound like a $vendor TAC :((

I think I ran it as root and not using sudo. But I did run a vagrant plugin list and now it's showing.

OK, that's solved...

==> r1: Waiting for domain to get an IP address... ==> r1: Waiting for SSH to become available...

You're starting two devices, and whatever is the last line in the printout is not always the culprit as Vagrant starts devices in parallel. It could easily be r2 that's stuck. You can try starting the lab "manually" with vagrant up --no-parallel (after running netlab create to recreate the configuration files) so you'll know exactly which device is causing the problems.

Please note that I'm not a KVM/libvirt/Vagrant/Linux guru. Things mostly work, but when they don't I'm as lost as the next guy... I feel like a taxi driver trying to troubleshoot an engine problem on a friend's car over the phone :(

kbreit commented 2 years ago

I ran it with vagrant up --no-parallel and seeing the same behavior. r1 comes up but SSH isn't detected as available. I can even SSH into the box using 192.168.121.101. If there aren't other troubleshooting ideas, I will try a Cisco CSR1000v image and see if it's specific to Arista. netlab test did work.

ipspace commented 2 years ago

I ran it with vagrant up --no-parallel and seeing the same behavior. r1 comes up but SSH isn't detected as available. I can even SSH into the box using 192.168.121.101.

That's totally weird, I've never seen it before. Do try a reload as a sacrifice to the Gods of Schroedinger Bugs.

Did you delete the old (mutated) Arista box (vagrant box remove name --box-version version). Also, I hope you used a different version number for the new box you built.

netlab test did work.

That's nice to hear, so there's hope...

kbreit commented 2 years ago

I didn't delete the old Arista box so trying it again. Doesn't seem to have fixed it but I'll let it run. CSR1000v image is downloading now.

ipspace commented 2 years ago

Fixed the tutorial, recommending cEOS on Linux (that's a breeze compared to making your own boxes). Is there anything else I can to do help?

kbreit commented 2 years ago

I redid the VM and it seems to be working properly. It came down to running things as root or sudo too much.