ipspace / netlab

Making virtual networking labs suck less
https://netlab.tools
Other
403 stars 56 forks source link

[BUG] Netlab up using srlinux results in an error #1202

Open SuSRitardanni opened 1 month ago

SuSRitardanni commented 1 month ago

Describe the bug

Having the topology attached in the following section and executing netlab up with it brings up this error in the ansible phase

TASK [Check that required plugin exists, use 'netlab install grpc' to install it] ***********************************
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: AttributeError: 'Connection' object has no attribute 'nonetype'
fatal: [s2]: FAILED! =>
  msg: Unexpected failure during module execution.
  stdout: ''
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: AttributeError: 'Connection' object has no attribute 'nonetype'
fatal: [s3]: FAILED! =>
  msg: Unexpected failure during module execution.
  stdout: ''
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: AttributeError: 'Connection' object has no attribute 'nonetype'
fatal: [s1]: FAILED! =>
  msg: Unexpected failure during module execution.
  stdout: ''
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: AttributeError: 'Connection' object has no attribute 'nonetype'
fatal: [s4]: FAILED! =>
  msg: Unexpected failure during module execution.
  stdout: ''

then ending with [FATAL] netlab up: netlab initial failed, aborting...

After this I can connect to the routers but they don't communicate with each other.

Bear in mind that just a few days ago it was working fine

To Reproduce

Most of the bugs are caused by an error in the data transformation code or templates -- the data structures or device configurations generated by netlab are not what you would expect (which should have been described in the above section).

If you're experiencing any other unexpected behavior, please add the steps needed to reproduce it.

Expected behavior

Netlab shouldn't give an error and accessing the router I should be able to ping other routers

Lab topology

*Not the minimal one, just the one I use

---
defaults:
  device: srlinux
  module: [ospf]

nodes:
 s1:
  provider: clab
  device: srlinux
 s2:
  provider: clab
  device: srlinux
 s3:
  provider: clab
  device: srlinux
 s4:
  provider: clab
  device: srlinux
 h1:
   device: linux
 h2:
   device: linux

links:
- s1-s2
- s2-s3
- s3-s4
- s4-s1
- s1-h1
- s2-h2

Output

[CREATED] provider configuration file: Vagrantfile
[INFO]    Creating configuration file for secondary provider clab
[CREATED] provider configuration file: clab-augment.yml
[CREATED] transformed topology dump in YAML format in netlab.snapshot.yml
[GROUPS]  group_vars for all
[GROUPS]  group_vars for modules
[GROUPS]  group_vars for srlinux
[HOSTS]   host_vars for s1
[HOSTS]   host_vars for s2
[HOSTS]   host_vars for s3
[HOSTS]   host_vars for s4
[GROUPS]  group_vars for linux
[HOSTS]   host_vars for h1
[HOSTS]   host_vars for h2
[CREATED] minimized Ansible inventory hosts.yml
[CREATED] Ansible configuration file: ansible.cfg

Version

netlab version 1.8.2

Additional context

Netlab running on the provided Vagrant using virtualbox as hypervisor, with a windows 10 host machine

Same error present running grpc test

vagrant@vagrant:~/lab1$ netlab test grpc

┌──────────────────────────────────────────────────────────────────────────────────┐
│ CHECKING grpc installation                                                       │
└──────────────────────────────────────────────────────────────────────────────────┘
[SUCCESS] grpc installed and working correctly

┌──────────────────────────────────────────────────────────────────────────────────┐
│ EXECUTING netlab up                                                              │
└──────────────────────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────────────────────┐
│ CREATING configuration files                                                     │
└──────────────────────────────────────────────────────────────────────────────────┘
[CREATED] provider configuration file: clab.yml
[CREATED] transformed topology dump in YAML format in netlab.snapshot.yml
[GROUPS]  group_vars for all
[GROUPS]  group_vars for modules
[GROUPS]  group_vars for srlinux
[HOSTS]   host_vars for s1
[HOSTS]   host_vars for s2
[HOSTS]   host_vars for s3
[CREATED] minimized Ansible inventory hosts.yml
[CREATED] Ansible configuration file: ansible.cfg

┌──────────────────────────────────────────────────────────────────────────────────┐
│ CHECKING virtualization provider installation                                    │
└──────────────────────────────────────────────────────────────────────────────────┘
[SUCCESS] clab installed and working correctly

┌──────────────────────────────────────────────────────────────────────────────────┐
│ STARTING clab nodes                                                              │
└──────────────────────────────────────────────────────────────────────────────────┘
provider clab: executing sudo -E containerlab deploy -t clab.yml
INFO[0000] Containerlab v0.49.0 started
INFO[0000] Parsing & checking topology file: clab.yml
INFO[0000] Creating docker network: Name="netlab_mgmt", IPv4Subnet="192.168.121.0/24", IPv6Subnet="", MTU='ל'
INFO[0000] Creating lab directory: /home/vagrant/lab1/test/clab-test
INFO[0000] Creating container: "s2"
INFO[0000] Creating container: "s3"
INFO[0000] Creating container: "s1"
INFO[0001] Creating link: s2:e1-2 <--> s3:e1-1
INFO[0001] Creating link: s1:e1-1 <--> s2:e1-1
INFO[0001] Creating link: s1:e1-2 <--> test_3:s1_e1-2
INFO[0001] Creating link: s2:e1-3 <--> test_3:s2_e1-3
INFO[0001] Creating link: s3:e1-2 <--> test_3:s3_e1-2
INFO[0001] Running postdeploy actions for Nokia SR Linux 's1' node
INFO[0001] Running postdeploy actions for Nokia SR Linux 's2' node
INFO[0001] Running postdeploy actions for Nokia SR Linux 's3' node
INFO[0067] Adding containerlab host entries to /etc/hosts file
INFO[0067] Adding ssh config for containerlab nodes
INFO[0067] 🎉 New containerlab version 0.54.2 is available! Release notes: https://containerlab.dev/rn/0.54/#0542
Run 'containerlab version upgrade' to upgrade or go check other installation options at https://containerlab.dev/install/
+---+--------------+--------------+------------------------------+------+---------+--------------------+--------------+
| # |     Name     | Container ID |            Image             | Kind |  State  |    IPv4 Address    | IPv6 Address |
+---+--------------+--------------+------------------------------+------+---------+--------------------+--------------+
| 1 | clab-test-s1 | e8d9e454d393 | ghcr.io/nokia/srlinux:24.3.2 | srl  | running | 192.168.121.101/24 | N/A          |
| 2 | clab-test-s2 | a46943f5c97f | ghcr.io/nokia/srlinux:24.3.2 | srl  | running | 192.168.121.102/24 | N/A          |
| 3 | clab-test-s3 | 4a53479ccdf0 | ghcr.io/nokia/srlinux:24.3.2 | srl  | running | 192.168.121.103/24 | N/A          |
+---+--------------+--------------+------------------------------+------+---------+--------------------+--------------+

┌──────────────────────────────────────────────────────────────────────────────────┐
│ DEPLOYING initial device configurations                                          │
└──────────────────────────────────────────────────────────────────────────────────┘
[WARNING]: Could not match supplied host pattern, ignoring: unprovisioned

PLAY [Deploy initial device configuration] **************************************************************************

TASK [Set variables that cannot be set with VARS] *******************************************************************
ok: [s1]
ok: [s3]
ok: [s2]

TASK [Find device readiness script] *********************************************************************************
ok: [s1]
ok: [s3]
ok: [s2]

TASK [Wait for device to become ready] ******************************************************************************
skipping: [s1]
skipping: [s2]
skipping: [s3]

TASK [Deploy initial configuration] *********************************************************************************
included: /usr/local/lib/python3.8/dist-packages/netsim/ansible/tasks/deploy-module.yml for s1, s2, s3

TASK [Figure out whether to deploy the module initial on current device] ********************************************
ok: [s1]
ok: [s2]
ok: [s3]

TASK [Find configuration template for initial] **********************************************************************
ok: [s1]
ok: [s3]
ok: [s2]

TASK [Print deployed configuration when running in verbose mode] ****************************************************
skipping: [s1]
skipping: [s2]
skipping: [s3]

TASK [Find configuration deployment deploy_script for initial] ******************************************************
ok: [s1]
ok: [s3]
ok: [s2]

TASK [Deploy initial configuration] *********************************************************************************
included: /usr/local/lib/python3.8/dist-packages/netsim/ansible/tasks/deploy-config/srlinux.yml for s1, s2, s3

TASK [Check that required plugin exists, use 'netlab install grpc' to install it] ***********************************
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: AttributeError: 'Connection' object has no attribute 'nonetype'
fatal: [s1]: FAILED! =>
  msg: Unexpected failure during module execution.
  stdout: ''
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: AttributeError: 'Connection' object has no attribute 'nonetype'
fatal: [s2]: FAILED! =>
  msg: Unexpected failure during module execution.
  stdout: ''
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: AttributeError: 'Connection' object has no attribute 'nonetype'
fatal: [s3]: FAILED! =>
  msg: Unexpected failure during module execution.
  stdout: ''

PLAY RECAP **********************************************************************************************************
s1                         : ok=7    changed=0    unreachable=0    failed=1    skipped=2    rescued=0    ignored=0
s2                         : ok=7    changed=0    unreachable=0    failed=1    skipped=2    rescued=0    ignored=0
s3                         : ok=7    changed=0    unreachable=0    failed=1    skipped=2    rescued=0    ignored=0

Error executing ansible-playbook /usr/local/lib/python3.8/dist-packages/netsim/ansible/initial-config.ansible:
  Command '['ansible-playbook', '/usr/local/lib/python3.8/dist-packages/netsim/ansible/initial-config.ansible']' returned non-zero exit status 2.
[FATAL]   Executing Ansible playbook /usr/local/lib/python3.8/dist-packages/netsim/ansible/initial-config.ansible failed
Error executing netlab initial --no-message:
  Command '['netlab', 'initial', '--no-message']' returned non-zero exit status 1.
[FATAL]   netlab up: netlab initial failed, aborting...
Error executing netlab up:
  Command '['netlab', 'up']' returned non-zero exit status 1.
[FATAL]   test: netlab up failed, aborting

==============================================================================
The test has failed. We will try to clean up the test directory and remove it,
but there's no guarantee that the cleanup process will succeed, in which case
please remove the test directory manually.

You might want to copy the error messages generated during the test before
proceeding. The cleanup process will start once you press RETURN.
==============================================================================

Press RETURN to continue ->

┌──────────────────────────────────────────────────────────────────────────────────┐
│ EXECUTING netlab down --cleanup --force                                          │
└──────────────────────────────────────────────────────────────────────────────────┘
Read transformed lab topology from snapshot file netlab.snapshot.yml

┌──────────────────────────────────────────────────────────────────────────────────┐
│ CHECKING virtualization provider installation                                    │
└──────────────────────────────────────────────────────────────────────────────────┘
[SUCCESS] clab installed and working correctly

┌──────────────────────────────────────────────────────────────────────────────────┐
│ STOPPING clab nodes                                                              │
└──────────────────────────────────────────────────────────────────────────────────┘
INFO[0000] Parsing & checking topology file: clab.yml
INFO[0000] Destroying lab: test
INFO[0000] Removed container: clab-test-s2
INFO[0000] Removed container: clab-test-s3
INFO[0000] Removed container: clab-test-s1
INFO[0000] Removing containerlab host entries from /etc/hosts file
INFO[0000] Removing ssh config for containerlab nodes

┌──────────────────────────────────────────────────────────────────────────────────┐
│ CLEANUP configuration files                                                      │
└──────────────────────────────────────────────────────────────────────────────────┘
... removing clab.yml
... removing ansible.cfg
... removing hosts.yml
... removing directory tree group_vars
... removing directory tree host_vars
... removing netlab.snapshot.yml
ipspace commented 1 month ago

Thank you for an excellent bug report. Unfortunately there's nothing we can do -- it looks like the famous Nokia GRPC Collection error. We found that it works (somewhat) with either old or very recent versions of Ansible (https://netlab.tools/caveats/#nokia-sr-linux)

Bear in mind that just a few days ago it was working fine

Did you perchance upgrade software or something along those lines? pip3 list|grep ansible would display the Ansible version you have installed.

ipspace commented 1 month ago

@jbemmel: How do you want to handle this? We could require Ansible core version higher than 2.16.6 (where the bug was fixed) or lower than 2.11 (where the bug was introduced), but I have no idea how that would play along with RH Enterprise Ansible.

Alternatively, we could trigger a warning, but it would probably get lost in the clutter, and not repeated once netlab initial crashes.

jbemmel commented 1 month ago

@jbemmel: How do you want to handle this? We could require Ansible core version higher than 2.16.6 (where the bug was fixed) or lower than 2.11 (where the bug was introduced), but I have no idea how that would play along with RH Enterprise Ansible.

Roman and I discussed and we are planning to switch to the SR Linux Ansible module (using JSON RPC), replacing the GRPC module

ipspace commented 1 month ago

@SuSRitardanni: Did you check the Ansible version? Did upgrade to 9.5.1 help?

Other than that, I'll put this one on the backburner as we have a working solution (use Ansible 9.5.1) and we're waiting for @jbemmel to complete #840.