sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
736 stars 1.42k forks source link

VxLAN Decap test case failure on TH based platform running with SAI 4.3.0.10/4.3.0.13 #6847

Closed gechiang closed 3 years ago

gechiang commented 3 years ago

4215_worked_syslog.txt 43013_failed_syslog.txt interface counteres_success_case.txt interface_counters_failed_case.txt sairedis.rec.4215_worked.gz sairedis.rec.43013_failed.gz

Description

While validating BRCM SAI 4.3.0.13-1 noticed that the VxLAN Decap Test case failed on TH platform while it works fine on TD3 platform. Used a TH platform and investigated further found that it was working fine using the same latest Master image but switched to SAI 4.2.1.5-12 which included a BRCM patch to the syncd crash fix detailed in BRCM case: CS00011639922 [4.2] Configuring Vxlan decap tunnel is failing on TH/TH2 (4.2.1.5)

Although this same patch is also there in SAI 4.3.0.10-3 and up including SAI 4.3.0.13 base code, testcase consistently failed with TH platform. On TD3 platform same SAI 4.3.0.10-3 and up as well as 4.3.0.13 all passes just fine.

So this seems to be a specific issue with TH and perhaps TH2 (did not have DUT access to validate this) only.

Steps to reproduce the issue:

  1. Load latest master image such as SONiC.HEAD.419-d5238ae8
  2. Run testcase vxlan/test_vxlan_decap.py

Describe the results you received:

Observed vxlan/test_vxlan_decap.py::test_vxlan_decap[Enabled] FAILED:

Here is the output from the failed testcase:

=================================== FAILURES ===================================
__________________________ test_vxlan_decap[Enabled] ___________________________

setup = {'mg_facts': {'deployment_id': u'1', 'dhcp_servers': [u'192.0.0.1', u'192.0.0.2', u'192.0.0.3', u'192.0.0.4', u'192.0....20.16/28', u'10.3.149.170/31', u'40.122.216.24', u'13.91.48.226', ...], 'inventory_hostname': u'str-s6100-acs-1', ...}}
vxlan_status = (True, 'Enabled')
duthosts = <tests.common.devices.DutHosts object at 0x7f4e5d8e9650>
rand_one_dut_hostname = 'str-s6100-acs-1'
ptfhost = <tests.common.devices.PTFHost object at 0x7f4e5c3d8950>
creds = {'ad_domain': 'GME', 'ad_integration_enabled': True, 'ansible_become_pass': "{{ secret_group_vars['str']['ansible_become_pass'] }}", 'ansible_ssh_pass': "{{ secret_group_vars['str']['ansible_ssh_pass'] }}", ...}

    def test_vxlan_decap(setup, vxlan_status, duthosts, rand_one_dut_hostname, ptfhost, creds):
        duthost = duthosts[rand_one_dut_hostname]

        sonicadmin_alt_password = duthost.host.options['variable_manager']._hostvars[duthost.hostname].get("ansible_altpassword")

        vxlan_enabled, scenario = vxlan_status
        logger.info("vxlan_enabled=%s, scenario=%s" % (vxlan_enabled, scenario))
        log_file = "/tmp/vxlan-decap.Vxlan.{}.{}.log".format(scenario, datetime.now().strftime('%Y-%m-%d-%H:%M:%S'))
        ptf_runner(ptfhost,
                   "ptftests",
                   "vxlan-decap.Vxlan",
                    platform_dir="ptftests",
                    params={"vxlan_enabled": vxlan_enabled,
                            "config_file": '/tmp/vxlan_decap.json',
                            "count": COUNT,
                            "sonic_admin_user": creds.get('sonicadmin_user'),
                            "sonic_admin_password": creds.get('sonicadmin_password'),
                            "sonic_admin_alt_password": sonicadmin_alt_password,
                            "dut_hostname": duthost.host.options['inventory_manager'].get_host(duthost.hostname).vars['ansible_host']},
                    qlen=10000,
>                   log_file=log_file)

creds      = {'ad_domain': 'GME', 'ad_integration_enabled': True, 'ansible_become_pass': "{{ secret_group_vars['str']['ansible_become_pass'] }}", 'ansible_ssh_pass': "{{ secret_group_vars['str']['ansible_ssh_pass'] }}", ...}
duthost    = <tests.common.devices.MultiAsicSonicHost object at 0x7f4e5d8e9710>
duthosts   = <tests.common.devices.DutHosts object at 0x7f4e5d8e9650>
log_file   = '/tmp/vxlan-decap.Vxlan.Enabled.2021-02-22-21:14:06.log'
ptfhost    = <tests.common.devices.PTFHost object at 0x7f4e5c3d8950>
rand_one_dut_hostname = 'str-s6100-acs-1'
scenario   = 'Enabled'
setup      = {'mg_facts': {'deployment_id': u'1', 'dhcp_servers': [u'192.0.0.1', u'192.0.0.2', u'192.0.0.3', u'192.0.0.4', u'192.0....20.16/28', u'10.3.149.170/31', u'40.122.216.24', u'13.91.48.226', ...], 'inventory_hostname': u'str-s6100-acs-1', ...}}
sonicadmin_alt_password = u'YourPaSsWoRd'
vxlan_enabled = True
vxlan_status = (True, 'Enabled')

vxlan/test_vxlan_decap.py:171:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
ptf_runner.py:41: in ptf_runner
    result = host.shell(cmd, chdir="/root", module_ignore_errors=module_ignore_errors)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <tests.common.devices.PTFHost object at 0x7f4e5c3d8950>
module_args = ('ptf --test-dir ptftests vxlan-decap.Vxlan --platform-dir ptftests --qlen=10000 --platform remote -t \'vxlan_enabled=...'"\'"\'YourPaSsWoRd\'"\'"\'\' --relax --debug info --log-file /tmp/vxlan-decap.Vxlan.Enabled.2021-02-22-21:14:06.log',)
complex_args = {'chdir': '/root'}
previous_frame = <frame object at 0x5627c6178bd0>
filename = '/var/src/Networking-acs-sonic-mgmt/tests/ptf_runner.py'
line_number = 41, function_name = 'ptf_runner'
lines = ['    result = host.shell(cmd, chdir="/root", module_ignore_errors=module_ignore_errors)\n']
index = 0, module_ignore_errors = False, module_async = False

    def _run(self, *module_args, **complex_args):

        previous_frame = inspect.currentframe().f_back
        filename, line_number, function_name, lines, index = inspect.getframeinfo(previous_frame)

        logging.debug("{}::{}#{}: [{}] AnsibleModule::{}, args={}, kwargs={}"\
            .format(filename, function_name, line_number, self.hostname,
                    self.module_name, json.dumps(module_args), json.dumps(complex_args)))

        module_ignore_errors = complex_args.pop('module_ignore_errors', False)
        module_async = complex_args.pop('module_async', False)

        if module_async:
            def run_module(module_args, complex_args):
                return self.module(*module_args, **complex_args)[self.hostname]
            pool = ThreadPool()
            result = pool.apply_async(run_module, (module_args, complex_args))
            return pool, result

        res = self.module(*module_args, **complex_args)[self.hostname]
        logging.debug("{}::{}#{}: [{}] AnsibleModule::{} Result => {}"\
            .format(filename, function_name, line_number, self.hostname, self.module_name, json.dumps(res)))

        if (res.is_failed or 'exception' in res) and not module_ignore_errors:
>           raise RunAnsibleModuleFail("run module {} failed".format(self.module_name), res)
E           RunAnsibleModuleFail: run module shell failed, Ansible Results =>
E           {
E               "changed": true,
E               "cmd": "ptf --test-dir ptftests vxlan-decap.Vxlan --platform-dir ptftests --qlen=10000 --platform remote -t 'vxlan_enabled=True;count=10;config_file='\"'\"'/tmp/vxlan_decap.json'\"'\"';sonic_admin_user=u'\"'\"'admin'\"'\"';sonic_admin_password=u'\"'\"'password'\"'\"';dut_hostname=u'\"'\"'10.3.147.243'\"'\"';sonic_admin_alt_password=u'\"'\"'YourPaSsWoRd'\"'\"'' --relax --debug info --log-file /tmp/vxlan-decap.Vxlan.Enabled.2021-02-22-21:14:06.log",
E               "delta": "0:01:50.995738",
E               "end": "2021-02-22 21:43:57.071928",
E               "failed": true,
E               "invocation": {
E                   "module_args": {
E                       "_raw_params": "ptf --test-dir ptftests vxlan-decap.Vxlan --platform-dir ptftests --qlen=10000 --platform remote -t 'vxlan_enabled=True;count=10;config_file='\"'\"'/tmp/vxlan_decap.json'\"'\"';sonic_admin_user=u'\"'\"'admin'\"'\"';sonic_admin_password=u'\"'\"'password'\"'\"';dut_hostname=u'\"'\"'10.3.147.243'\"'\"';sonic_admin_alt_password=u'\"'\"'YourPaSsWoRd'\"'\"'' --relax --debug info --log-file /tmp/vxlan-decap.Vxlan.Enabled.2021-02-22-21:14:06.log",
E                       "_uses_shell": true,
E                       "argv": null,
E                       "chdir": "/root",
E                       "creates": null,
E                       "executable": null,
E                       "removes": null,
E                       "stdin": null,
E                       "stdin_add_newline": true,
E                       "strip_empty_ends": true,
E                       "warn": true
E                   }
E               },
E               "msg": "non-zero return code",
E               "rc": 1,
E               "start": "2021-02-22 21:42:06.076190",
E               "stderr": "WARNING: No route found for IPv6 destination :: (no default route?)\n/usr/local/lib/python2.7/dist-packages/paramiko/transport.py:33: CryptographyDeprecationWarning: Python 2 is no longer supported by the Python core team. Support for it is now deprecated in cryptography, and will be removed in the next release.\n  from cryptography.hazmat.backends import default_backend\nvxlan-decap.Vxlan ... FAIL\n\n======================================================================\nFAIL: vxlan-decap.Vxlan\n----------------------------------------------------------------------\nTraceback (most recent call last):\n  File \"ptftests/vxlan-decap.py\", line 399, in runTest\n    self.work_test()\n  File \"ptftests/vxlan-decap.py\", line 392, in work_test\n    raise AssertionError(err)\nAssertionError: VxlanTest failed:\n  sent = 10 rcvd = 0 | src_port=6 dst_port=6 | src_mac=8c:01:02:03:04:05 dst_mac=4c:76:25:f5:48:80 src_ip=8.8.8.8 dst_ip=10.1.0.32 | Inner: src_mac=4c:76:25:f5:48:80 dst_mac=7c:fe:90:80:9f:06 src_ip=192.168.0.1 dst_ip=192.168.0.2 vni=1336 | net_port_rel(acc)=0 acc_port_rel=0\n\n\n----------------------------------------------------------------------\nRan 1 test in 109.371s\n\nFAILED (failures=1)",
E               "stderr_lines": [
E                   "WARNING: No route found for IPv6 destination :: (no default route?)",
E                   "/usr/local/lib/python2.7/dist-packages/paramiko/transport.py:33: CryptographyDeprecationWarning: Python 2 is no longer supported by the Python core team. Support for it is now deprecated in cryptography, and will be removed in the next release.",
E                   "  from cryptography.hazmat.backends import default_backend",
E                   "vxlan-decap.Vxlan ... FAIL",
E                   "",
E                   "======================================================================",
E                   "FAIL: vxlan-decap.Vxlan",
E                   "----------------------------------------------------------------------",
E                   "Traceback (most recent call last):",
E                   "  File \"ptftests/vxlan-decap.py\", line 399, in runTest",
E                   "    self.work_test()",
E                   "  File \"ptftests/vxlan-decap.py\", line 392, in work_test",
E                   "    raise AssertionError(err)",
E                   "AssertionError: VxlanTest failed:",
E                   "  sent = 10 rcvd = 0 | src_port=6 dst_port=6 | src_mac=8c:01:02:03:04:05 dst_mac=4c:76:25:f5:48:80 src_ip=8.8.8.8 dst_ip=10.1.0.32 | Inner: src_mac=4c:76:25:f5:48:80 dst_mac=7c:fe:90:80:9f:06 src_ip=192.168.0.1 dst_ip=192.168.0.2 vni=1336 | net_port_rel(acc)=0 acc_port_rel=0",
E                   "",
E                   "",
E                   "----------------------------------------------------------------------",
E                   "Ran 1 test in 109.371s",
E                   "",
E                   "FAILED (failures=1)"
E               ],
E               "stdout": "",
E               "stdout_lines": []
E           }

complex_args = {'chdir': '/root'}
filename   = '/var/src/Networking-acs-sonic-mgmt/tests/ptf_runner.py'
function_name = 'ptf_runner'
index      = 0
line_number = 41
lines      = ['    result = host.shell(cmd, chdir="/root", module_ignore_errors=module_ignore_errors)\n']
module_args = ('ptf --test-dir ptftests vxlan-decap.Vxlan --platform-dir ptftests --qlen=10000 --platform remote -t \'vxlan_enabled=...'"\'"\'YourPaSsWoRd\'"\'"\'\' --relax --debug info --log-file /tmp/vxlan-decap.Vxlan.Enabled.2021-02-22-21:14:06.log',)
module_async = False
module_ignore_errors = False
previous_frame = <frame object at 0x5627c6178bd0>
res        = {'stderr_lines': [u'WARNING: No route found for IPv6 destination :: (no defaul...: [], u'start': u'2021-02-22 21:42:06.076190', u'msg': u'non-zero return code'}
self       = <tests.common.devices.PTFHost object at 0x7f4e5c3d8950>

common/devices.py:98: RunAnsibleModuleFail
--- generated xml file: /var/src/Networking-acs-sonic-mgmt/tests/logs/tr.xml ---
=========================== short test summary info ============================
FAILED vxlan/test_vxlan_decap.py::test_vxlan_decap[Enabled] - RunAnsibleModul...
========================== 1 failed in 156.77 seconds ==========================

Describe the results you expected:

Test should pass without failure as it did with SAI4.2.1.5-12 did on TH platform.

Output of show version:

admin@str-s6100-acs-1:~$ show vers

SONiC Software Version: SONiC.HEAD.419-d5238ae8
Distribution: Debian 10.8
Kernel: 4.19.0-12-2-amd64
Build commit: d5238ae8
Build date: Sun Feb 21 20:09:50 UTC 2021
Built by: johnar@jenkins-worker-22

Platform: x86_64-dell_s6100_c2538-r0
HwSKU: Force10-S6100
ASIC: broadcom
ASIC Count: 1
Serial Number: 29LQG02
Uptime: 21:34:24 up 8 min,  1 user,  load average: 3.59, 3.35, 1.78

Docker images:
REPOSITORY                    TAG                 IMAGE ID            SIZE
docker-sonic-mgmt-framework   HEAD.419-d5238ae8   263f8ae566a3        615MB
docker-sonic-mgmt-framework   latest              263f8ae566a3        615MB
docker-sonic-telemetry        HEAD.419-d5238ae8   0b480a60ec23        472MB
docker-sonic-telemetry        latest              0b480a60ec23        472MB
docker-teamd                  HEAD.419-d5238ae8   47fe3a9edd35        408MB
docker-teamd                  latest              47fe3a9edd35        408MB
docker-nat                    HEAD.419-d5238ae8   34958b2ec3b2        411MB
docker-nat                    latest              34958b2ec3b2        411MB
docker-platform-monitor       HEAD.419-d5238ae8   f0d4f91d94b3        605MB
docker-platform-monitor       latest              f0d4f91d94b3        605MB
docker-orchagent              HEAD.419-d5238ae8   c1215d085954        426MB
docker-orchagent              latest              c1215d085954        426MB
docker-macsec                 HEAD.419-d5238ae8   2b1c9e3301c0        411MB
docker-macsec                 latest              2b1c9e3301c0        411MB
docker-fpm-frr                HEAD.419-d5238ae8   bcbd7573e524        426MB
docker-fpm-frr                latest              bcbd7573e524        426MB
docker-sflow                  HEAD.419-d5238ae8   c5a3679bfc2e        409MB
docker-sflow                  latest              c5a3679bfc2e        409MB
docker-snmp                   HEAD.419-d5238ae8   f9cb0090ce13        438MB
docker-snmp                   latest              f9cb0090ce13        438MB
docker-syncd-brcm             HEAD.419-d5238ae8   48b20b2edda5        679MB
docker-syncd-brcm             latest              48b20b2edda5        679MB
docker-lldp                   HEAD.419-d5238ae8   55888b66d7cf        437MB
docker-lldp                   latest              55888b66d7cf        437MB
docker-router-advertiser      HEAD.419-d5238ae8   38c5442ce591        397MB
docker-router-advertiser      latest              38c5442ce591        397MB
docker-database               HEAD.419-d5238ae8   4ff2e4935832        397MB
docker-database               latest              4ff2e4935832        397MB
docker-dhcp-relay             HEAD.419-d5238ae8   84da62f420dc        404MB
docker-dhcp-relay             latest              84da62f420dc        404MB

admin@str-s6100-acs-1:~$

BRCM Case filed for this issue: CS00011976879 [4.3][TH][VxLAN Decap] VxLAN Decap Test failed on 4.3

gechiang commented 3 years ago

This issue has been fixed by latest BRCM SAI 4.3.3.4. as part of the following PR (https://github.com/Azure/sonic-buildimage/pull/7218) Closing this issue...