sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
722 stars 1.38k forks source link

[VxLAN] Removing the VxLAN tunnel through CLI doesn't delete the tunnel from hardware #12063

Closed dgsudharsan closed 1 year ago

dgsudharsan commented 1 year ago

Description

After configuring a static VxLAN and VNET, trying to remove vxlan using CLI will result in successfully removing it in the application but not in the hardware. The reason is the delete task doesn't call any SAI API. (https://github.com/sonic-net/sonic-swss/blob/fd0c585d34de63dc9cca57bb7d508cadfe1125d4/orchagent/vxlanorch.cpp#L1536)

Previously the delete task had the flow of removing the tunnels https://github.com/sonic-net/sonic-swss/pull/931. However this flow was modified in the commit https://github.com/sonic-net/sonic-swss/commit/095f481e63997b64392bc46c2f176a04b8889667 due to which delete vxlan will not make any SAI calls

Steps to reproduce the issue:

  1. Create VxLAN using CLI
  2. Remove it using config vxlan del command

Describe the results you received:

Delete vxlan returns successfully without any errors but leaving the hardware tunnel intact

Describe the results you expected:

Delete should return success or show throw error if there are any references

Output of show version:

show version

SONiC Software Version: SONiC.detached.1836-83704d995
Distribution: Debian 11.4
Kernel: 5.10.0-12-2-amd64
Build commit: 83704d995
Build date: Sat Aug 27 05:10:38 UTC 2022
Built by: sw-r2d2-bot@r-build-sonic02-005

Platform: x86_64-mlnx_msn3800-r0
HwSKU: ACS-MSN3800
ASIC: mellanox
ASIC Count: 1
Serial Number: MT1925X00004
Model Number: MSN3800-CS2F
Hardware Revision: A2
Uptime: 21:14:04 up 14:38,  1 user,  load average: 0.51, 0.64, 0.76
Date: Tue 13 Sep 2022 21:14:04

Docker images:
REPOSITORY                    TAG                       IMAGE ID       SIZE
docker-orchagent              detached.1836-83704d995   fd11a401e814   478MB
docker-orchagent              latest                    fd11a401e814   478MB
docker-fpm-frr                detached.1836-83704d995   670c27cd3327   489MB
docker-fpm-frr                latest                    670c27cd3327   489MB
docker-teamd                  detached.1836-83704d995   a14ce103012f   459MB
docker-teamd                  latest                    a14ce103012f   459MB
docker-macsec                 latest                    b0fb0be14a81   461MB
docker-syncd-mlnx             detached.1836-83704d995   134dec23270f   983MB
docker-syncd-mlnx             latest                    134dec23270f   983MB
docker-platform-monitor       detached.1836-83704d995   35d03efa7752   990MB
docker-platform-monitor       latest                    35d03efa7752   990MB
docker-dhcp-relay             latest                    ad34a2fd418d   453MB
docker-sonic-telemetry        detached.1836-83704d995   77d9fc9c8899   524MB
docker-sonic-telemetry        latest                    77d9fc9c8899   524MB
docker-snmp                   detached.1836-83704d995   8425d6595df4   490MB
docker-snmp                   latest                    8425d6595df4   490MB
docker-lldp                   detached.1836-83704d995   574a29a5f79e   486MB
docker-lldp                   latest                    574a29a5f79e   486MB
docker-database               detached.1836-83704d995   9b1e1894c88c   443MB
docker-database               latest                    9b1e1894c88c   443MB
docker-mux                    detached.1836-83704d995   e62dacc96223   492MB
docker-mux                    latest                    e62dacc96223   492MB
docker-router-advertiser      detached.1836-83704d995   420abd7429e7   443MB
docker-router-advertiser      latest                    420abd7429e7   443MB
docker-nat                    detached.1836-83704d995   bb72b5146a26   434MB
docker-nat                    latest                    bb72b5146a26   434MB
docker-sflow                  detached.1836-83704d995   6959519de196   432MB
docker-sflow                  latest                    6959519de196   432MB
docker-sonic-mgmt-framework   detached.1836-83704d995   6e57b0d815ce   561MB
docker-sonic-mgmt-framework   latest                    6e57b0d815ce   561MB

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

zhangyanzhao commented 1 year ago

Besides the bug fix, we should also add one test case for vxlan tunnel deletion

srj102 commented 1 year ago

@dgsudharsan Please describe the exact sequence which leads to the Tunnel object creation in the SAI ? As far as I see it happens only as part of the vlan-vni or vrf-vni map creation. Similarly deletion should happen after the last mapping entry is removed.

dgsudharsan commented 1 year ago

@srj102 I see. In case of VNET deleting the VNET before VxLAN tunnel will clear the tunnel from the hardware. However the problem here I see is the CLI command to remove the VxLAN tunnel succeeds without throwing any dependency errors whether a tunnel map or VNET is associated with it. This needs to be fixed.

srj102 commented 1 year ago

@dgsudharsan the foll is the snippet for vxlan deletion. Here the vlan vni mappings are handled.

vxlan_keys = db.cfgdb.get_keys("VXLAN_TUNNEL_MAP|*")
if not vxlan_keys:
  vxlan_count = 0
else:
  vxlan_count = len(vxlan_keys)

if(vxlan_count > 0):
    ctx.fail("Please delete all VLAN VNI mappings.")

w.r.t VNET, If the user is going to add/remove vnet and vnet to vni mappings without using the CLI, wouldn't the user prefer the same for the VXLAN_TUNNEL table as well ? In which case how useful would it be to add this check in the "config vxlan del" CLI handler ?

Today vxlanmgr has checks for vlan vni mappings to take care of usecases without CLI, So I think the check can be put here.

doVxlanTunnelDeleteTask() ..

// If there are mappings still against this tunnel then hold on.
if (m_vxlanTunnelCache[vxlanTunnelName].vlan_vni_refcnt)
{
    SWSS_LOG_WARN("Tunnel %s deletion failed. Need to delete mapping entries", vxlanTunnelName.c_str());
    return false;
}

An additional check for m_vnetCache.size() can be added in vxlanmgr as part of doVxlanTunnelDeleteTask() and return false if it is a non-zero size ?

dgsudharsan commented 1 year ago

@srj102 I believe handling in CLI should be good enough as CLI is the only flow for delete. I have raised a fix https://github.com/sonic-net/sonic-utilities/pull/2404

adyeung commented 1 year ago

Closing this issue, #2404 already merged