sonic-net / sonic-mgmt

Configuration management examples for SONiC
Other
196 stars 716 forks source link

[platform_tests/test_cont_warm_reboot.py] - Case failed on T0 Physical Topology. #8567

Open Vickyni2 opened 1 year ago

Vickyni2 commented 1 year ago

Description The test case platform_tests/test_cont_warm_reboot.py fails due to Continuous reboot test failed in T0 topology with a physical DUT

Steps to reproduce the issue:

  1. Run the test case in a physical DUT.

Describe the results you received:

    if self.sub_test_result is True:
        test_dir = os.path.join(self.log_dir, "pass", str(self.reboot_count))
    else:
        test_dir = os.path.join(self.log_dir, "fail", str(self.reboot_count))
    os.makedirs(test_dir)
    for file in log_files:
        try:
            file_exists = os.path.isfile(file)
            if file_exists:
                shutil.move(file, test_dir)
        except Exception:
            logging.error("Error copying file {}".format(str(file)))
    report_file =  os.path.join(test_dir, "continuous_reboot_report.json")
    test_report["checks"] = self.test_report
    with open(report_file, "w") as report_file:
        json.dump(test_report, report_file, indent=4)

    pytest_assert(self.test_failures == 0, "Continuous reboot test failed {}/{} times".\
      format(self.test_failures, self.reboot_count))

E Failed: Continuous reboot test failed 1/1 times

file = '/tmp/swss.rec' file_exists = False header = ['test_id', 'image', 'is_new_image', 'up_time', 'test_duration', 'result'] log_files = ['/tmp/warm-reboot.log', '/tmp/capture.pcap', '/tmp/capture_filtered.pcap', '/tmp/syslog', '/tmp/sairedis.rec', '/tmp/swss.rec'] report_file = <closed file '/data/sonic-mgmt/tests/continous_reboot_2023-06-12_13-12-07/fail/1/continuous_reboot_report.json', mode 'w' at 0x7f7b9d28b660> self = <tests.platform_tests.test_cont_warm_reboot.ContinuousReboot instance at 0x7f7b9f7e5320> test_dir = '/data/sonic-mgmt/tests/continous_reboot_2023-06-12_13-12-07/fail/1' test_report = {'checks': {'check_interfaces_and_transceivers': 'TRANSCEIVER INFO of Ethernet28 is not found in DB', 'check_neighbors...check_services': True, ...}, 'image': u'SONiC-OS-202211.269499-59c7d39ef', 'is_new_image': False, 'result': False, ...} writer = <csv.DictWriter instance at 0x7f7b9f97a640>

platform_tests/test_cont_warm_reboot.py:273: Failed ---------------------------- generated xml file: /data/sonic-mgmt/tests/logs/12-06-2023/test_cont_warm_reboot/tr.xml ---------------------------- ------------------------------------------------------------ live log sessionfinish ------------------------------------------------------------- 13:14:56 init.pytest_terminal_summary L0064 INFO | Can not get Allure report URL. Please check logs =========================== short test summary info ============================ FAILED platform_tests/test_cont_warm_reboot.py::test_continuous_reboot[dev-msn2700-01] ========================== 1 failed in 683.88 seconds ==========================

Describe the results you expected:

The Case expected to pass.

Additional information you deem important:

**Output of `show version`:**

SONiC Software Version: SONiC.202211.269499-59c7d39ef SONiC OS Version: 11 Distribution: Debian 11.6 Kernel: 5.10.0-18-2-amd64 Build commit: 59c7d39ef Build date: Tue May 9 17:58:15 UTC 2023 Built by: AzDevOps@vmss-soni00118K

Platform: x86_64-accton_as7716_32x-r0 HwSKU: Accton-AS7716-32X ASIC: broadcom ASIC Count: 1 Serial Number: N/A Model Number: N/A Hardware Revision: N/A Uptime: 20:09:00 up 7 min, 1 user, load average: 0.85, 0.48, 0.22 Date: Sun 07 Aug 2022 20:09:00

**Attach debug file `sudo generate_dump`:**

```
(paste your output here)
```
bingwang-ms commented 1 year ago

From the test error message, the transceiver of Ethernet28 is missing. Did you check the interface status after the reboot?

'check_interfaces_and_transceivers': 'TRANSCEIVER INFO of Ethernet28 is not found in DB'
yxieca commented 1 year ago

@Vickyni2 the error is saying that after warm reboot, one transceiver information is missing. Only one SFP information missing is suspicious. Can you manually run warm reboot on accton platform, and check if all SFP information are available afterwards? If so, then the test might need to wait for SFP information longer. Can you adjust the wait time and raise a PR?