sonic-net / sonic-mgmt

Configuration management examples for SONiC
Other
174 stars 695 forks source link

[VXLAN_ECMP] Leftover after cleanup of test_vxlan_ecmp.py #5076

Closed ihorchekh closed 2 years ago

ihorchekh commented 2 years ago

Description The test_vxlan_ecmp.py doesn't do cleanup properly. During the test it creates Vnet_v4_in_v4-0 interface and it remains after the test finished. Moreover, running test in the loop causing failure, because of it.

Steps to reproduce the issue:

  1. Install an image, deploy minigraph
  2. Run the test_vxlan_ecmp.py
  3. Check vnet interfaces after the test has finished
  4. Repeat 2-3 steps. The test should fail after 2-4 runs.

Describe the results you received: Leftover after the test finished. Test fails after 2-4 runs

show vnet interfaces
vnet name        interfaces
---------------  ------------
Vnet_v4_in_v4-0  Ethernet60

Describe the results you expected: No leftovers. Test pass constantly

show vnet interfaces
vnet name        interfaces
---------------  ------------

Additional information you deem important:

Output of show version:

SONiC Software Version: SONiC.202012.216-5f3269a61_Internal
Distribution: Debian 10.11
Kernel: 4.19.0-12-2-amd64
Build commit: 5f3269a61
Build date: Mon Jan 31 11:06:52 UTC 2022
Built by: sw-r2d2-bot@r-build-sonic-ci02-244

Platform: x86_64-mlnx_msn3800-r0
HwSKU: Mellanox-SN3800-D112C8
ASIC: mellanox
ASIC Count: 1
Serial Number: MT1937X00527
Uptime: 09:04:12 up  1:30,  1 user,  load average: 1.52, 1.17, 1.20

Docker images:
REPOSITORY                    TAG                             IMAGE ID            SIZE
docker-teamd                  202012.216-5f3269a61_Internal   58e6c9d933eb        392MB
docker-teamd                  latest                          58e6c9d933eb        392MB
docker-nat                    202012.216-5f3269a61_Internal   ed43d29cb984        395MB
docker-nat                    latest                          ed43d29cb984        395MB
docker-orchagent              202012.216-5f3269a61_Internal   c61551391a95        411MB
docker-orchagent              latest                          c61551391a95        411MB
docker-fpm-frr                202012.216-5f3269a61_Internal   cf718323ab4c        411MB
docker-fpm-frr                latest                          cf718323ab4c        411MB
docker-sflow                  202012.216-5f3269a61_Internal   8a5bdc7dac46        393MB
docker-sflow                  latest                          8a5bdc7dac46        393MB
docker-syncd-mlnx             202012.216-5f3269a61_Internal   931228ad4c2b        965MB
docker-syncd-mlnx             latest                          931228ad4c2b        965MB
docker-snmp                   202012.216-5f3269a61_Internal   15914a194a0a        422MB
docker-snmp                   latest                          15914a194a0a        422MB
docker-dhcp-relay             202012.216-5f3269a61_Internal   dea978252a7a        393MB
docker-dhcp-relay             latest                          dea978252a7a        393MB
docker-mux                    202012.216-5f3269a61_Internal   d712f6d583fb        437MB
docker-mux                    latest                          d712f6d583fb        437MB
docker-router-advertiser      202012.216-5f3269a61_Internal   7c14699b69ca        380MB
docker-router-advertiser      latest                          7c14699b69ca        380MB
docker-platform-monitor       202012.216-5f3269a61_Internal   9feb29ea0aaf        673MB
docker-platform-monitor       latest                          9feb29ea0aaf        673MB
docker-lldp                   202012.216-5f3269a61_Internal   4086facc8fc3        420MB
docker-lldp                   latest                          4086facc8fc3        420MB
docker-database               202012.216-5f3269a61_Internal   eb326db92704        380MB
docker-database               latest                          eb326db92704        380MB
docker-sonic-mgmt-framework   202012.216-5f3269a61_Internal   ad183bf610a8        793MB
docker-sonic-mgmt-framework   latest                          ad183bf610a8        793MB
docker-sonic-telemetry        202012.216-5f3269a61_Internal   e03068e245e7        469MB
docker-sonic-telemetry        latest                          e03068e245e7        469MB

Attach debug file sudo generate_dump:

sonic_dump_r-tigris-13_20220207_081510.tar.gz vxlan_ecmp_run1.txt vxlan_ecmp_run2.txt vxlan_ecmp_run3.txt

dgsudharsan commented 2 years ago

@prsunny Can you please assign it to relevant team to handle?

prsunny commented 2 years ago

@rraghav-cisco , would you please check on this?

rraghav-cisco commented 2 years ago

Pls assign to me.

rraghav-cisco commented 2 years ago

@dgsudharsan : Pls let me know where to access the script run logs.

dgsudharsan commented 2 years ago

@ihorchekh Can you please attach the script run logs in this bug?

ihorchekh commented 2 years ago

@dgsudharsan @rraghav-cisco I've attached three log files:

rraghav-cisco commented 2 years ago

I ran the script internally attempting to reproduce this problem. What I observed is that the vnets are all removed from the DUT, however they remain attached to the portchannel interfaces. I am not sure if this is cause of the problem listed in this bug.

show run all | less output:

 "PORTCHANNEL_INTERFACE": {
        "PortChannel0002": {
            "vnet_name": "Vnet_v4_in_v6-0"
        },
}

So I tried to remove the entries using the following CLI: redis-cli -n 4 hdel "PORTCHANNEL_INTERFACE|PortChannel0019" vnet_name in case of T1-64-lag or redis-cli -n 4 hdel "INTERFACE|Ethernet4" vnet_name in case of T1.

This removes the vnets from the interfaces. However, even after this, the script is failing the second attempt to run it. The reason I found was that the vnet routes are not present in the ASIC DB, even though they are showing up in the "show vnet route all" output.

root@mth64-m5-2:/home/cisco# show vnet route all
vnet name    prefix    nexthop    interface
-----------  --------  ---------  -----------

vnet name        prefix                   endpoint              mac address    vni
---------------  -----------------------  --------------------  -------------  -----
Vnet_v4_in_v4-0  150.0.3.1/32             100.0.1.10
Vnet_v4_in_v4-0  150.0.4.1/32             100.0.2.10
Vnet_v4_in_v4-0  fddd:a150:a0::a13:1/128  100.0.11.10
Vnet_v4_in_v4-0  fddd:a150:a0::a14:1/128  100.0.12.10
Vnet_v4_in_v6-0  150.0.8.1/32             fddd:a100:a0::a6:10
Vnet_v4_in_v6-0  150.0.9.1/32             fddd:a100:a0::a7:10
Vnet_v4_in_v6-0  fddd:a150:a0::a18:1/128  fddd:a100:a0::a16:10
Vnet_v4_in_v6-0  fddd:a150:a0::a19:1/128  fddd:a100:a0::a17:10
root@mth64-m5-2:/home/cisco# redis-dump -y -d0 | grep 150.0.3.1
  "VNET_ROUTE_TUNNEL_TABLE:Vnet_v4_in_v4-0:150.0.3.1/32": {
root@mth64-m5-2:/home/cisco# redis-dump -y -d1 | grep 150.0.3.1
root@mth64-m5-2:/home/cisco# redis-dump -y -d2 | grep 150.0.3.1
root@mth64-m5-2:/home/cisco#

One way to ensure complete cleanup is to do config reload. But I decided to use the vxlan clean up code from the previous test_vnet_vxlan.py. @prsunny: pls advice.

rraghav-cisco commented 2 years ago

@ihorchekh , @prsunny , @dgsudharsan , @mattetti : Pls review, and if possible, pls verify the fix in the PR:https://github.com/Azure/sonic-mgmt/pull/5190 and confirm it works for you.