ipspace / netlab

Making virtual networking labs suck less
https://netlab.tools
Other
439 stars 66 forks source link

[BUG] (maybe bug?) trying to parse templates for modules not in use #1042

Closed ssasso closed 8 months ago

ssasso commented 8 months ago

I am struggling my head around this, and cannot understand if it's my fault or not.

I have two host groups, spine and leaf.

on spine I defined:

module: [ ospf, bgp, evpn ]

and on leaf:

module: [ vlan, vrf, vxlan, ospf, bgp, evpn, gateway ]

BUT, when running netlab up, it seems netlab is trying to configure, i.e., VRF on a spine. (note: vrf is only an example, it happens also for other modules)

TASK [Deploy vrf configuration] **********************************************************************************************************************************
included: /root/GIT_H/netlab-clone/netlab/netsim/ansible/tasks/deploy-config/arubacx.yml for spine, sw1, sw2, sw3, sw4

TASK [tempfile] **************************************************************************************************************************************************
changed: [spine -> localhost]
changed: [sw1 -> localhost]
changed: [sw2 -> localhost]
changed: [sw3 -> localhost]
changed: [sw4 -> localhost]

TASK [template] **************************************************************************************************************************************************
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: ansible.errors.AnsibleUndefinedVariable: 'vrfs' is undefined
fatal: [spine -> localhost]: FAILED! => changed=false
  msg: 'AnsibleUndefinedVariable: ''vrfs'' is undefined'
changed: [sw1 -> localhost]
changed: [sw2 -> localhost]
changed: [sw3 -> localhost]
changed: [sw4 -> localhost]
ipspace commented 8 months ago

This definitely looks like a bug and might have been caused by my recent rearrangement of the configuration deployment playbook. Please post full lab topology so I can do some tests.

Thank you!

ipspace commented 8 months ago

I used the following topology with the dev branch and it works as expected:

provider: clab
defaults.device: eos

bgp.as: 65000

groups:
  _auto_create: True
  spine:
    members: [ s1 ]
    module: [ ospf, bgp, evpn ]
  leaf:
    members: [ l1, l2 ]
    module: [ vlan, vrf, vxlan, ospf, bgp, evpn, gateway ]

vlans:
  red:
    vrf: red
    links: [ l1, l2 ]

vrfs:
  red:

links: [ l1-s1, l2-s1 ]

Can you try it with Aruba so we can see whether it's something in the Aruba-specific task list that triggers the unwanted "template" call?

Otherwise, do netlab inspect --node X module to see what modules netlab thinks should be active on a node. If that turns out to be weird we have another gremlin to chase.

ssasso commented 8 months ago

This is the topology:


provider: clab

addressing:
  loopback:
    ipv4: 10.100.0.0/24
    prefix: 32
  router_id:
    ipv4: 10.100.0.0/24
    prefix: 32

vrfs:
  tenant:
    ospf: false

vlans:
  red:
    vrf: tenant

groups:
  leaf:
    device: arubacx
    members: [ sw1, sw2, sw3, sw4 ]
    module: [ vlan, vrf, vxlan, ospf, bgp, evpn, gateway ]
    node_data:
      bgp:
        as: 65000
        advertise_loopback: false
        activate:
          ipv4: [ ebgp ]

nodes:
  spine:
    device: arubacx
    id: 1
    module: [ ospf, bgp, evpn ]
    bgp:
      rr: True
      as: 65000
      advertise_loopback: false
      activate:
        ipv4: []
  sw1:
    id: 11
  sw2:
    id: 12
  sw3:
    id: 21
  sw4:
    id: 22
  cl1:
    device: arubacx
    id: 31
    module: [vlan]
  cl2:
    device: arubacx
    id: 32
    module: [vlan]

links:
- spine:
  sw1:
  ospf.cost: 10
- spine:
  sw2:
  ospf.cost: 10
- spine:
  sw3:
  ospf.cost: 10
- spine:
  sw4:
  ospf.cost: 10

# Links to Client Switches
# SW1 to CL1
- sw1:
  cl1:
  vlan.access: red
  gateway: true

# SW3 to CL2
- sw3:
  cl2:
  gateway: true
  vlan.access: red

netlab inspect, i.e. for spine, reports the correct/expected result:

# netlab inspect --node spine module
- ospf
- bgp
- evpn

Will try your topology now.

ssasso commented 8 months ago

also your topology is failing:

TASK [Figure out whether to deploy the module vrf on current device] *********************************************************************************************
ok: [l1]
ok: [l2]
ok: [s1]

TASK [Find configuration template for vrf] ***********************************************************************************************************************
ok: [l1]
ok: [l2]
ok: [s1]

TASK [Print deployed configuration when running in verbose mode] *************************************************************************************************
skipping: [l1]
skipping: [l2]
skipping: [s1]

TASK [Find configuration deployment deploy_script for vrf] *******************************************************************************************************
ok: [l1]
ok: [s1]
ok: [l2]

TASK [Deploy vrf configuration] **********************************************************************************************************************************
included: /root/GIT_H/netlab-clone/netlab/netsim/ansible/tasks/deploy-config/arubacx.yml for l1, l2, s1

TASK [tempfile] **************************************************************************************************************************************************
changed: [l1 -> localhost]
changed: [l2 -> localhost]
changed: [s1 -> localhost]

TASK [template] **************************************************************************************************************************************************
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: ansible.errors.AnsibleUndefinedVariable: 'vrfs' is undefined
fatal: [s1 -> localhost]: FAILED! => changed=false
  msg: 'AnsibleUndefinedVariable: ''vrfs'' is undefined'
changed: [l1 -> localhost]
changed: [l2 -> localhost]

"fun" fact: netlab initial -o works fine.

ipspace commented 8 months ago

Works for me with Arista EOS. This is the VRF-related printout from the Ansible playbook:

TASK [Figure out whether to deploy the module vrf on current device] ***************************************************************
ok: [cl1]
ok: [cl2]
ok: [sw1]
ok: [sw2]
ok: [spine]
ok: [sw3]
ok: [sw4]

TASK [Find configuration template for vrf] *****************************************************************************************
skipping: [cl1]
skipping: [cl2]
skipping: [spine]
skipping: [sw2]
skipping: [sw4]
ok: [sw1]
ok: [sw3]

TASK [Print deployed configuration when running in verbose mode] *******************************************************************
skipping: [cl1]
skipping: [cl2]
skipping: [spine]
skipping: [sw1]
skipping: [sw2]
skipping: [sw3]
skipping: [sw4]

TASK [Find configuration deployment deploy_script for vrf] *************************************************************************
skipping: [cl1]
skipping: [cl2]
skipping: [spine]
skipping: [sw2]
skipping: [sw4]
ok: [sw1]
ok: [sw3]

TASK [Deploy vrf configuration] ****************************************************************************************************
skipping: [cl1]
skipping: [cl2]
skipping: [spine]
skipping: [sw2]
skipping: [sw4]
included: /home/pipi/net101/tools/netsim/ansible/tasks/deploy-config/eos.yml for sw1, sw3

TASK [eos_config: deploying vrf from /home/pipi/net101/tools/netsim/ansible/templates/vrf/eos.j2] **********************************
changed: [sw3]
changed: [sw1]
ipspace commented 8 months ago

Looks like there's some propagation of when conditions (or something similar) that I broke. Can you post your printout starting with "Figure out whether to deploy the module vrf on current device", preferably using "netlab up -v" to get verbose output from Ansible?

ipspace commented 8 months ago

I think I found it (and fixed it). You must be using an older version of Ansible.

Thanks a million for spotting this stupidity!

ssasso commented 8 months ago

Working fine now!

TASK [Find configuration template for vrf] ***********************************************************************************************************************
skipping: [s1]
ok: [l1]
ok: [l2]

TASK [Print deployed configuration when running in verbose mode] *************************************************************************************************
skipping: [l1]
skipping: [l2]
skipping: [s1]

TASK [Find configuration deployment deploy_script for vrf] *******************************************************************************************************
skipping: [s1]
ok: [l1]
ok: [l2]

TASK [Deploy vrf configuration] **********************************************************************************************************************************
skipping: [s1]
included: /root/GIT_H/netlab-clone/netlab/netsim/ansible/tasks/deploy-config/arubacx.yml for l1, l2

thanks!!!!