[BUG] Containerlab does not work out-of-the-box

mkuurstra commented 1 year ago

Describe the bug

Containerlab does not work out-of-the-box

To Reproduce

Install netlab using Virtualbox or Ubuntu
Create topology.yml with:

---
name: initial_test

provider: clab
defaults:
  device: eos
  devices:
    eos.clab.image: ceos:4.28.4M

nodes: [s1, s2, l1, l2]

module: [ospf]

links:
  - l1-s1
  - l1-s2
  - l2-s1
  - l2-s2

Run netlab up
Containers are deployed but Ansible will not be able to connect
Routing seems off:

$ ip route get 192.168.121.2
192.168.121.2 dev libvirt-mgmt src 192.168.121.1 uid 1000
    cache
$ ip a | grep 192.168.121
    inet 192.168.121.1/24 brd 192.168.121.255 scope global libvirt-mgmt
    inet 192.168.121.1/24 brd 192.168.121.255 scope global br-94d6bec811f6

Workaround

Add defaults.addressing.mgmt.ipv4: 192.168.123.0/24 to the topology YAML. Now a connection is possible and the IP interfaces look sane:

$ ip a | grep 192.168.12
    inet 192.168.121.1/24 brd 192.168.121.255 scope global libvirt-mgmt
    inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
    inet 192.168.123.1/24 brd 192.168.123.255 scope global br-84f6ef0fc049

Expected behavior

I think netlab using clab should work with the supplied defaults. It seems like this was envisioned because a default is supplied here but is seems to get overwritten by the value here

Output

$ netlab up topology.yml
Created provider configuration file: clab.yml
Created transformed topology dump in YAML format in netlab.snapshot.yml
Created group_vars for all
Created group_vars for modules
Created group_vars for eos
Created host_vars for s1
Created host_vars for s2
Created host_vars for l1
Created host_vars for l2
Created minimized Ansible inventory hosts.yml
Created Ansible configuration file: ansible.cfg

Step 2: Checking virtualization provider installation
============================================================
.. all tests succeeded, moving on

Step 3: starting the lab
============================================================
INFO[0000] Containerlab v0.34.0 started
INFO[0000] Parsing & checking topology file: clab.yml
INFO[0000] Creating lab directory: /home/marvink/netlab/initial_test/clab-initial_test
INFO[0000] Creating docker network: Name="netlab_mgmt", IPv4Subnet="192.168.121.0/24", IPv6Subnet="", MTU="1500"
INFO[0000] Creating container: "l1"
INFO[0000] Creating container: "s2"
INFO[0000] Creating container: "l2"
INFO[0000] Creating container: "s1"
INFO[0001] Creating virtual wire: s2:et2 <--> l2:et2
INFO[0001] Creating virtual wire: s2:et1 <--> l1:et2
INFO[0001] Creating virtual wire: s1:et1 <--> l1:et1
INFO[0001] Creating virtual wire: s1:et2 <--> l2:et1
INFO[0002] Running postdeploy actions for Arista cEOS 's2' node
INFO[0002] Running postdeploy actions for Arista cEOS 'l2' node
INFO[0002] Running postdeploy actions for Arista cEOS 'l1' node
INFO[0002] Running postdeploy actions for Arista cEOS 's1' node
INFO[0059] Adding containerlab host entries to /etc/hosts file
+---+----------------------+--------------+--------------+------+---------+--------------------+--------------+
| # |         Name         | Container ID |    Image     | Kind |  State  |    IPv4 Address    | IPv6 Address |
+---+----------------------+--------------+--------------+------+---------+--------------------+--------------+
| 1 | clab-initial_test-l1 | 4ca1a94550a2 | ceos:4.28.4M | ceos | running | 192.168.121.103/24 | N/A          |
| 2 | clab-initial_test-l2 | 81725c86dbd0 | ceos:4.28.4M | ceos | running | 192.168.121.104/24 | N/A          |
| 3 | clab-initial_test-s1 | a3ce89002763 | ceos:4.28.4M | ceos | running | 192.168.121.101/24 | N/A          |
| 4 | clab-initial_test-s2 | efdbd7edb98e | ceos:4.28.4M | ceos | running | 192.168.121.102/24 | N/A          |
+---+----------------------+--------------+--------------+------+---------+--------------------+--------------+

Step 4: deploying initial device configurations
============================================================
[WARNING]: Could not match supplied host pattern, ignoring: unprovisioned

PLAY [Deploy initial device configuration] *****************************************************************************

TASK [set_fact] ********************************************************************************************************
ok: [l2]
ok: [l1]
ok: [s1]
ok: [s2]

TASK [Find initial configuration template] *****************************************************************************
skipping: [s1] => (item=/usr/local/lib/python3.10/dist-packages/netsim/ansible/templates/initial/eos.j2)
skipping: [s2] => (item=/usr/local/lib/python3.10/dist-packages/netsim/ansible/templates/initial/eos.j2)
skipping: [l1] => (item=/usr/local/lib/python3.10/dist-packages/netsim/ansible/templates/initial/eos.j2)
skipping: [s1]
skipping: [s2]
skipping: [l1]
skipping: [l2] => (item=/usr/local/lib/python3.10/dist-packages/netsim/ansible/templates/initial/eos.j2)
skipping: [l2]

TASK [set_fact] ********************************************************************************************************
ok: [l2]
ok: [l1]
ok: [s2]
ok: [s1]

TASK [Deploy initial device configuration] *****************************************************************************
included: /usr/local/lib/python3.10/dist-packages/netsim/ansible/tasks/deploy-config/eos.yml for l2, l1, s2, s1 => (item=/usr/local/lib/python3.10/dist-packages/netsim/ansible/tasks/deploy-config/eos.yml)

TASK [wait_for_connection] *********************************************************************************************
fatal: [l2]: FAILED! => changed=false
  elapsed: 62
  msg: 'timed out waiting for connection port up: ssh connection failed: ssh connect failed: No route to host'
fatal: [s1]: FAILED! => changed=false
  elapsed: 62
  msg: 'timed out waiting for connection port up: ssh connection failed: ssh connect failed: No route to host'
fatal: [s2]: FAILED! => changed=false
  elapsed: 62
  msg: 'timed out waiting for connection port up: ssh connection failed: ssh connect failed: No route to host'
fatal: [l1]: FAILED! => changed=false
  elapsed: 62
  msg: 'timed out waiting for connection port up: ssh connection failed: ssh connect failed: No route to host'

PLAY RECAP *************************************************************************************************************
l1                         : ok=3    changed=0    unreachable=0    failed=1    skipped=1    rescued=0    ignored=0
l2                         : ok=3    changed=0    unreachable=0    failed=1    skipped=1    rescued=0    ignored=0
s1                         : ok=3    changed=0    unreachable=0    failed=1    skipped=1    rescued=0    ignored=0
s2                         : ok=3    changed=0    unreachable=0    failed=1    skipped=1    rescued=0    ignored=0

Fatal error in netlab: Executing Ansible playbook /usr/local/lib/python3.10/dist-packages/netsim/ansible/initial-config.ansible failed:
  Command '['ansible-playbook', '/usr/local/lib/python3.10/dist-packages/netsim/ansible/initial-config.ansible']' returned non-zero exit status 2.
Error executing netlab initial:
  Command '['netlab', 'initial']' returned non-zero exit status 1.
Fatal error in netlab up: netlab initial failed, aborting...

Version

1.4.2

Additional context

Add any other context about the problem here.

mkuurstra commented 1 year ago

I think https://github.com/ipspace/netlab/issues/701 could fix this

ipspace commented 1 year ago

Thanks a million for reporting this. It's an interesting side effect of how vagrant-libvirt behavior changed over time.

When the netlab project started, vagrant-libvirt required a predefined virtual switch for the management network, so the installation script created one. However, later versions of the same plugin deleted that virtual switch every time vagrant destroy is executed, so it was safe to use the same address range with containerlab as the management network.

In your case, executing netlab test libvirt should solve the problem (as netlab down calling vagrant destroy will remove the management virtual switch and its IP subnet).

The solution is pretty simple: do not create the vagrant-libvirt network during netlab install libvirt process, as it's automatically created by netlab up and destroyed by vagrant destroy. Will fix...

You also opened a very interesting can of worms I haven't thought about when implementing #706 :( That will be a doozie...

ipspace / netlab