sonic-net / sonic-mgmt

Configuration management examples for SONiC
Other
201 stars 727 forks source link

test_vlan configuration failure on L2 switch #4057

Open slutati1536 opened 3 years ago

slutati1536 commented 3 years ago

Description

on a switch with L2 configuration (https://github.com/Azure/SONiC/wiki/L2-Switch-mode) the test_vlan fails during the setup configuration on the switch (the function create_vlan_interfaces(vlan_ports_list, ptfhost) in setup_vlan is failing).

The PR #3892 changed the vlan_ports_list, the index of the ports is incorrect on setup with L2. It's causing the failure during create_vlan_interfaces:

run module command failed, Ansible Results =>
{
    "changed": true, 
    "cmd": [
        "ip", 
        "link", 
        "add", 
        "link", 
        "eth104", 
        "name", 
        "eth104.200", 
        "type", 
        "vlan", 
        "id", 
        "200"
    ], 
    "delta": "0:00:00.818234", 
    "end": "2021-08-17 07:57:35.723959", 
    "failed": true, 
    "invocation": {
        "module_args": {
            "_raw_params": "ip link add link eth104 name eth104.200 type vlan id 200", 
            "_uses_shell": false, 
            "argv": null, 
            "chdir": null, 
            "creates": null, 
            "executable": null, 
            "removes": null, 
            "stdin": null, 
            "stdin_add_newline": true, 
            "strip_empty_ends": true, 
            "warn": true
        }
    }, 
    "msg": "non-zero return code", 
    "rc": 1, 
    "start": "2021-08-17 07:57:34.905725", 
    "stderr": "Cannot find device \"eth104\"", 
    "stderr_lines": [
        "Cannot find device \"eth104\""
    ], 
    "stdout": "", 
    "stdout_lines": []
}

Steps to reproduce the issue:

  1. configure L2 on dut (https://github.com/Azure/SONiC/wiki/L2-Switch-mode)
  2. better to put except with breakpoint during setup_vlan, like so:

    
    def setup_vlan(duthosts, rand_one_dut_hostname, ptfhost, vlan_ports_list, vlan_intfs_list, cfg_facts):
    duthost = duthosts[rand_one_dut_hostname]
    # --------------------- Setup -----------------------
    try:
        import pdb; pdb.set_trace()
        portchannel_interfaces = cfg_facts.get('PORTCHANNEL_INTERFACE', {})
    
        shutdown_portchannels(duthost, portchannel_interfaces)
    
        create_test_vlans(duthost, cfg_facts, vlan_ports_list, vlan_intfs_list)
    
        startup_portchannels(duthost, portchannel_interfaces)
    
        create_vlan_interfaces(vlan_ports_list, ptfhost)
    
        add_test_routes(duthost, vlan_ports_list)
    
        setUpArpResponder(vlan_ports_list, ptfhost)
    
    # --------------------- Testing -----------------------
        yield
    # --------------------- Teardown -----------------------
    except Exception as e:
        import pdb; pdb.set_trace()
        print e
    finally:
        tearDown(vlan_ports_list, duthost, ptfhost)

3.the failure occurs during the function create_vlan_interfaces because ptf doesn't have interface eth104, the indexes are incorrect. probably they shouldn't be based on mini graph because the L2 configuration is based on config_db

**Describe the results you received:**
we can see in the log of the tests that during the setup the Create PTF VLAN intfs is failing because of incorrect ptf interfaces info.

14:25:14 __init__._fixture_generator_decorator    L0070 INFO   | -------------------- fixture setup_vlan setup starts -------------------- 

14:25:14 test_vlan.shutdown_portchannels          L0102 INFO   | Shutdown lags, flush IP addresses 

14:25:14 test_vlan.create_test_vlans              L0112 INFO   | Add vlans, assign IPs 

14:25:14 test_vlan.create_test_vlans              L0120 INFO   | Delete untagged vlans from interfaces 

14:25:14 test_vlan.create_test_vlans              L0130 INFO   | Add members to Vlans 

14:25:20 test_vlan.startup_portchannels           L0143 INFO   | Bringup lags 

**14:25:21 test_vlan.create_vlan_interfaces         L0085 INFO   | Create PTF VLAN intfs # the fanction is failing** 

14:25:22 test_vlan.tearDown                       L0189 INFO   | VLAN test ending ... 

14:25:22 test_vlan.tearDown                       L0190 INFO   | Stop arp_responder 

**Describe the results you expected:**

setup_vlan should pass without issues on a L2 setup

**Additional information you deem important:**

there is L2 configuration with Microsoft hwsku Mellanox-SN3800-D112C8 on 3800 platform

    **Output of `show version`:**

admin@r-tigris-04:~$ show boot Current: SONiC-OS-202012.142-8915e488b_Internal Next: SONiC-OS-202012.142-8915e488b_Internal Available: SONiC-OS-202012.142-8915e488b_Internal

admin@r-tigris-04:~$ show platform summary Platform: x86_64-mlnx_msn3800-r0 HwSKU: Mellanox-SN3800-D112C8 ASIC: mellanox ASIC Count: 1 admin@r-tigris-04:~$



    **Attach debug file `sudo generate_dump`:**

[sonic_dump_r-tigris-04_20210817_080335.tar.gz](https://github.com/Azure/sonic-mgmt/files/6999089/sonic_dump_r-tigris-04_20210817_080335.tar.gz)
prsunny commented 3 years ago

@bingwang-ms , could you please check this on SN4600C?

bingwang-ms commented 3 years ago

@prsunny I can't repro the issue on SN4600C in L2 mode. @slutati1536 Could you please re-run the case with lattest code? We updated the code recently. Thanks