sonic-net / sonic-mgmt

Configuration management examples for SONiC
Other
174 stars 695 forks source link

Errors in PTF Autorestart module test – cannot find marker end-LogAnalyzer error #6280

Closed kbabujp closed 1 year ago

kbabujp commented 1 year ago

Description: Few test in Autorestart module errors intermittently with “cannot find marker end-LogAnalyzer error” with default time out of 60 sec to check the marker in syslog

Steps to reproduce – Run All autorestart module test in PTF ./run_tests.sh -n dut-t1 -d sonic-dut -f ../ansible/testbed.csv -i ../ansible/lab,../ansible/veos -t t1,any -a False -e "--skip_sanity" -u -c autorestart

=========================== short test summary info ============================ PASSED autorestart/test_container_autorestart.py::test_containers_autorestart[sonic-dut-None-lldp] PASSED autorestart/test_container_autorestart.py::test_containers_autorestart[sonic-dut-None-pmon] PASSED autorestart/test_container_autorestart.py::test_containers_autorestart[sonic-dut-None-snmp] PASSED autorestart/test_container_autorestart.py::test_containers_autorestart[sonic-dut-None-telemetry] PASSED autorestart/test_container_autorestart.py::test_containers_autorestart[sonic-dut-None-bgp] PASSED autorestart/test_container_autorestart.py::test_containers_autorestart[sonic-dut-None-mgmt-framework] PASSED autorestart/test_container_autorestart.py::test_containers_autorestart[sonic-dut-None-teamd] PASSED autorestart/test_container_autorestart.py::test_containers_autorestart[sonic-dut-None-swss] PASSED autorestart/test_container_autorestart.py::test_containers_autorestart[sonic-dut-None-syncd] SKIPPED [1] /var/admin/sonic-mgmt/tests/common/helpers/assertions.py:13: Skipping test for container macsec SKIPPED [1] /var/admin/sonic-mgmt/tests/common/helpers/assertions.py:13: Skipping test for container radv SKIPPED [1] /var/admin/sonic-mgmt/tests/common/helpers/assertions.py:13: Skipping test for container sflow SKIPPED [1] /var/admin/sonic-mgmt/tests/common/helpers/assertions.py:13: Skipping test for container dhcp_relay SKIPPED [1] /var/admin/sonic-mgmt/tests/common/helpers/assertions.py:13: Skipping test for container mux SKIPPED [1] /var/admin/sonic-mgmt/tests/common/helpers/assertions.py:13: Skipping test for container nat ERROR autorestart/test_container_autorestart.py::test_containers_autorestart[sonic-dut-None-teamd] ERROR autorestart/test_container_autorestart.py::test_containers_autorestart[sonic-dut-None-swss] ERROR autorestart/test_container_autorestart.py::test_containers_autorestart[sonic-dut-None-syncd] =============== 9 passed, 6 skipped, 3 error in 1750.27 seconds ================

Snippet of PTF test run log: All errors are due to below failures

Few tests errors intermittently with below error E "stderr_lines": [ E "Traceback (most recent call last):", E " File \"/tmp/loganalyzer.py\", line 809, in ", E " main(sys.argv[1:])", E " File \"/tmp/loganalyzer.py\", line 793, in main", E " analyzer.place_marker(log_file_list, analyzer.create_end_marker(), wait_for_marker=True)", E " File \"/tmp/loganalyzer.py\", line 251, in place_marker", E " raise RuntimeError(\"cannot find marker {} in /var/log/syslog\".format(marker))", E "RuntimeError: cannot find marker end-LogAnalyzer-test_containers_autorestart[sonic-dut-None-syncd].2022-08-22-06:03:54 in /var/log/syslog"

Show version info is given below.

SONiC Software Version: SONiC.202111.Innovium.3.0.0.20220816.015358 Distribution: Debian 11.4 Kernel: 5.10.0-8-2-amd64 Marvell SAI version: T2.0.1 Build commit: 003c8dfde

Expected – Test should pass without any errors.

Workaround/Fix – Increase the time out for checking the end marker in syslog to 120sec from default 60sec

diff --git a/ansible/roles/test/files/tools/loganalyzer/loganalyzer.py b/ansible/roles/test/files/tools/loganalyzer/loganalyzer.py index cf9214b6..ceab7353 100644 --- a/ansible/roles/test/files/tools/loganalyzer/loganalyzer.py +++ b/ansible/roles/test/files/tools/loganalyzer/loganalyzer.py @@ -194,7 +194,7 @@ class AnsibleLogAnalyzer: syslogger.info('\n') self.flush_rsyslogd()

yxieca commented 1 year ago

@kbabujp can you check the syslog to see if these markers were really missing? Or did they show up too late?

Please attach test log / show tech support.

kbabujp commented 1 year ago

@yxieca , The markers were shows after the timeout of 60 sec.

If i change the below line to increase timeout to 120, it passed (no errors seen).

in ansible/roles/test/files/tools/loganalyzer/loganalyzer.py - def wait_for_marker(self, marker, timeout=60, polling_interval=10): def wait_for_marker(self, marker, timeout=120, polling_interval=10):

yxieca commented 1 year ago

@kbabujp can you raise PR to address the issue then? If the issue has been addressed, please close.

kbabujp commented 1 year ago

@yxieca, Opened new PR to fix this issue. Will close this issue once the PR merged.

kbabujp commented 1 year ago

@yxieca, Could you please review and merge the PR https://github.com/sonic-net/sonic-mgmt/pull/6322. All the checks has passed.