sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
736 stars 1.42k forks source link

[HLX] soft-reboot failed to boot up #8692

Open Xichen96 opened 3 years ago

Xichen96 commented 3 years ago

Description

soft-reboot does not actually boot up. Console show the switch keeps rebooting.

Steps to reproduce the issue:

  1. install public image https://sonic-build.azurewebsites.net/api/sonic/artifacts?branchName=202012&platform=broadcom&target=target/sonic-broadcom.bin
  2. run soft-reboot test

Describe the results you received:

Describe the results you expected:

Output of show version:

SONiC Software Version: SONiC.20201231.23
Distribution: Debian 10.10
Kernel: 4.19.0-12-2-amd64
Build commit: 7c791dbbc8
Build date: Fri Sep 3 12:17:54 UTC 2021
Built by: AzDevOps@sonic-int-build-workers-0003EK

Platform: x86_64-cel_e1031-r0
HwSKU: Celestica-E1031-T48S4
ASIC: broadcom
ASIC Count: 1
Serial Number: R0882F2B039723BY000014
Uptime: 05:28:56 up 54 min, 1 user, load average: 1.83, 2.08, 2.18

Docker images:
REPOSITORY TAG IMAGE ID SIZE
docker-syncd-brcm 20201231.23 6dea96746b36 694MB
docker-syncd-brcm latest 6dea96746b36 694MB
docker-snmp 20201231.23 49c04cec0029 443MB
docker-snmp latest 49c04cec0029 443MB
docker-teamd 20201231.23 36f555b4fef8 412MB
docker-teamd latest 36f555b4fef8 412MB
docker-router-advertiser 20201231.23 a90afecca506 402MB
docker-router-advertiser latest a90afecca506 402MB
docker-platform-monitor 20201231.23 c920d7e90943 612MB
docker-platform-monitor latest c920d7e90943 612MB
docker-lldp 20201231.23 10756e3f3d1a 442MB
docker-lldp latest 10756e3f3d1a 442MB
docker-dhcp-relay 20201231.23 c4a2e143c2de 409MB
docker-dhcp-relay latest c4a2e143c2de 409MB
docker-database 20201231.23 17492b436856 402MB
docker-database latest 17492b436856 402MB
docker-orchagent 20201231.23 78137aebb544 431MB
docker-orchagent latest 78137aebb544 431MB
docker-sonic-telemetry 20201231.23 df8929ddd2ec 491MB
docker-sonic-telemetry latest df8929ddd2ec 491MB
docker-fpm-frr 20201231.23 cacf6e6cd65c 431MB
docker-fpm-frr latest cacf6e6cd65c 431MB

Output of show techsupport:

Additional information you deem important (e.g. issue happens only occasionally):

Xichen96 commented 3 years ago

/data/sonic-mgmt-int/tests$ ./run_tests.sh -c platform_tests/test_reboot.py::test_soft_reboot -n xxxxxx -i ../ansible/str,../ansible/veos -f ../ansible/testbed.csv -e "--disable_loganalyzer" -u === Running tests in groups === /usr/local/lib/python2.7/dist-packages/ansible/parsing/vault/init.py:44: CryptographyDeprecationWarning: Python 2 is no longer supported by the Python core team. Support for it is now deprecated in cryptography, and will be removed in a future release. from cryptography.exceptions import InvalidSignature ======================================================== test session starts ======================================================== platform linux2 -- Python 2.7.17, pytest-4.6.5, py-1.9.0, pluggy-0.13.1 ansible: 2.8.12 rootdir: /data/sonic-mgmt-int/tests, inifile: pytest.ini plugins: forked-1.3.0, xdist-1.28.0, html-1.22.1, metadata-1.10.0, repeat-0.9.1, ansible-2.2.2 collected 1 item

platform_tests/test_reboot.py::test_soft_reboot[xxxxxx] ----------------------------------------------------------- live log call ----------------------------------------------------------- 06:30:13 init.pytest_runtest_call L0039 ERROR | Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/_pytest/python.py", line 1464, in runtest self.ihook.pytest_pyfunc_call(pyfuncitem=self) File "/usr/local/lib/python2.7/dist-packages/pluggy/hooks.py", line 286, in call return self._hookexec(self, self.get_hookimpls(), kwargs) File "/usr/local/lib/python2.7/dist-packages/pluggy/manager.py", line 93, in _hookexec return self._inner_hookexec(hook, methods, kwargs) File "/usr/local/lib/python2.7/dist-packages/pluggy/manager.py", line 87, in firstresult=hook.spec.opts.get("firstresult") if hook.spec else False, File "/usr/local/lib/python2.7/dist-packages/pluggy/callers.py", line 208, in _multicall return outcome.get_result() File "/usr/local/lib/python2.7/dist-packages/pluggy/callers.py", line 81, in get_result _reraise(ex) # noqa File "/usr/local/lib/python2.7/dist-packages/pluggy/callers.py", line 187, in _multicall res = hook_impl.function(args) File "/usr/local/lib/python2.7/dist-packages/_pytest/python.py", line 174, in pytest_pyfunc_call testfunction(**testargs) File "/data/sonic-mgmt-int/tests/platform_tests/test_reboot.py", line 138, in test_soft_reboot reboot_and_check(localhost, duthost, conn_graph_facts["device_conn"][duthost.hostname], xcvr_skip_list, reboot_type=REBOOT_TYPE_SOFT) File "/data/sonic-mgmt-int/tests/platform_tests/test_reboot.py", line 60, in reboot_and_check reboot(dut, localhost, reboot_type=reboot_type, reboot_helper=reboot_helper, reboot_kwargs=reboot_kwargs) File "/data/sonic-mgmt-int/tests/common/reboot.py", line 162, in reboot raise Exception('DUT {} did not startup'.format(hostname)) Exception: DUT xxxxxx did not startup

platform_tests/test_reboot.py::test_soft_reboot[xxxxxx] ERROR [100%]

============================================================== ERRORS =============================================================== __ ERROR at teardown of test_soft_reboot[xxxxxx] ___

duthosts = <tests.common.devices.duthosts.DutHosts object at 0x7f99ebe10450>, enum_rand_one_per_hwsku_hostname = 'xxxxxx' conn_graph_facts = {'device_conn': {'xxxxxx': {'Ethernet0': {'peerdevice': u'xxxxx', 'peerport': u'Ethernet0', 'speed'...ard', 'HwSku': u'xxxxxxx', 'ManagementGw': u'xxxxxxx', 'ManagementIp': uxxxxxxxx', ...}}, ...} xcvr_skip_list = {'xxxxxx': []}

@pytest.fixture(scope="module", autouse=True)
def teardown_module(duthosts, enum_rand_one_per_hwsku_hostname, conn_graph_facts, xcvr_skip_list):
    duthost = duthosts[enum_rand_one_per_hwsku_hostname]
    yield

    logging.info("Tearing down: to make sure all the critical services, interfaces and transceivers are good")
    interfaces = conn_graph_facts["device_conn"][duthost.hostname]
  check_critical_processes(duthost, watch_secs=10)

conn_graph_facts = {'device_conn': {'xxxxxx': {'Ethernet0': {'peerdevice': u'sx', 'peerport': u'Ethernet0', 'speed'...ard', 'HwSku': u'x', 'ManagementGw': u'x', 'ManagementIp': u'x', ...}}, ...} duthost = xxxxxx duthosts = <tests.common.devices.duthosts.DutHosts object at 0x7f99ebe10450> enum_rand_one_per_hwsku_hostname = 'xxxxxx' interfaces = {'Ethernet0': {'peerdevice': u'x', 'peerport': u'Ethernet0', 'speed': u'1000'}, 'Ethernet1': {'peerdevic... 'speed': u'1000'}, 'Ethernet11': {'peerdevice': u'x', 'peerport': u'Ethernet11', 'speed': u'1000'}, ...} xcvr_skip_list = {'xxxxxx': []}

platform_tests/test_reboot.py:43:


dut = xxxxxx, watch_secs = 10

def check_critical_processes(dut, watch_secs=0):
    """
    @summary: check all critical processes. They should be all running.
              keep on checking every 5 seconds until watch_secs drops below 0.
    @param dut: The AnsibleHost object of DUT. For interacting with DUT.
    @param watch_secs: all processes should remain healthy for watch_secs seconds.
    """
    logging.info("Check all critical processes are healthy for {} seconds".format(watch_secs))
    while watch_secs >= 0:
        status, details = get_critical_processes_status(dut)
      pytest_assert(status, "Not all critical processes are healthy: {}".format(details))

E Failed: Not all critical processes are healthy: {'lldp': {'status': False, 'exited_critical_process': [], 'running_critical_process': []}, 'pmon': {'status': False, 'exited_critical_process': [], 'running_critical_process': []}, 'database': {'status': False, 'exited_critical_process': [], 'running_critical_process': []}, 'snmp': {'status': False, 'exited_critical_process': [], 'running_critical_process': []}, 'bgp': {'status': False, 'exited_critical_process': [], 'running_critical_process': []}, 'teamd': {'status': False, 'exited_critical_process': [], 'running_critical_process': []}, 'syncd': {'status': False, 'exited_critical_process': [], 'running_critical_process': []}, 'swss': {'status': False, 'exited_critical_process': [], 'running_critical_process': []}}

details = {'bgp': {'exited_critical_process': [], 'running_critical_process': [], 'status': False}, 'database': {'exited_critica...': [], 'status': False}, 'pmon': {'exited_critical_process': [], 'running_critical_process': [], 'status': False}, ...} dut = xxxxxx status = False watch_secs = 10

common/platform/processesutils.py:36: Failed ============================================================= FAILURES ============================================================== ____ test_softreboot[xxxxxx] ____

duthosts = <tests.common.devices.duthosts.DutHosts object at 0x7f99ebe10450>, enum_rand_one_per_hwsku_hostname = 'xxxxxx' localhost = <tests.common.devices.local.Localhost object at 0x7f99ea197710> conn_graph_facts = {'device_conn': {'xxxxxx': {'Ethernet0': {'peerdevice': u'xxxxxx', 'peerport': u'Ethernet0', 'speed'...ard', 'HwSku': u'Celestica-E1031-T48S4', 'ManagementGw': u'xxxxxx', 'ManagementIp': u'xxxxxx80/23', ...}}, ...} xcvr_skip_list = {'xxxxxx': []}

def test_soft_reboot(duthosts, enum_rand_one_per_hwsku_hostname, localhost, conn_graph_facts, xcvr_skip_list):
    """
    @summary: This test case is to perform soft reboot and check platform status
    """

    duthost = duthosts[enum_rand_one_per_hwsku_hostname]

    soft_reboot_supported = duthost.command('which soft-reboot', module_ignore_errors=True)["stdout"]
    if "" == soft_reboot_supported:
        pytest.skip("Soft-reboot is not supported on this DUT, skip this test case")

    if duthost.is_multi_asic:
        pytest.skip("Multi-ASIC devices not supporting soft reboot")
  reboot_and_check(localhost, duthost, conn_graph_facts["device_conn"][duthost.hostname], xcvr_skip_list, reboot_type=REBOOT_TYPE_SOFT)

conn_graph_facts = {'device_conn': {'xxxxxx': {'Ethernet0': {'peerdevice': u'xxxxxx', 'peerport': u'Ethernet0', 'speed'...ard', 'HwSku': u'Celestica-E1031-T48S4', 'ManagementGw': u'xxxxxx', 'ManagementIp': u'xxxxxx80/23', ...}}, ...} duthost = xxxxxx duthosts = <tests.common.devices.duthosts.DutHosts object at 0x7f99ebe10450> enum_rand_one_per_hwsku_hostname = 'xxxxxx' localhost = <tests.common.devices.local.Localhost object at 0x7f99ea197710> soft_reboot_supported = u'/usr/local/bin/soft-reboot' xcvr_skip_list = {'xxxxxx': []}

platform_tests/test_reboot.py:138:


platform_tests/test_reboot.py:60: in reboot_and_check reboot(dut, localhost, reboot_type=reboot_type, reboot_helper=reboot_helper, reboot_kwargs=reboot_kwargs)


duthost = xxxxxx, localhost = <tests.common.devices.local.Localhost object at 0x7f99ea197710> reboot_type = 'soft', delay = 10, timeout = 300, wait = 120, wait_for_ssh = True, reboot_helper = None, reboot_kwargs = None

def reboot(duthost, localhost, reboot_type='cold', delay=10, \
    timeout=0, wait=0, wait_for_ssh=True, reboot_helper=None, reboot_kwargs=None):
    """
    reboots DUT
    :param duthost: DUT host object
    :param localhost:  local host object
    :param reboot_type: reboot type (cold, fast, warm)
    :param delay: delay between ssh availability checks
    :param timeout: timeout for waiting ssh port state change
    :param wait: time to wait for DUT to initialize
    :param reboot_helper: helper function to execute the power toggling
    :param reboot_kwargs: arguments to pass to the reboot_helper
    :return:
    """

    # pool for executing tasks asynchronously
    pool = ThreadPool()
    dut_ip = duthost.mgmt_ip
    hostname = duthost.hostname
    try:
        reboot_ctrl    = reboot_ctrl_dict[reboot_type]
        reboot_command = reboot_ctrl['command'] if reboot_type != REBOOT_TYPE_POWEROFF else None
        if timeout == 0:
            timeout = reboot_ctrl['timeout']
        if wait == 0:
            wait = reboot_ctrl['wait']
    except KeyError:
        raise ValueError('invalid reboot type: "{} for {}"'.format(reboot_type, hostname))

    def execute_reboot_command():
        logger.info('rebooting {} with command "{}"'.format(hostname, reboot_command))
        return duthost.command(reboot_command)

    def execute_reboot_helper():
        logger.info('rebooting {} with helper "{}"'.format(hostname, reboot_helper))
        return reboot_helper(reboot_kwargs)

    dut_datetime = duthost.get_now_time()
    DUT_ACTIVE.clear()

    if reboot_type != REBOOT_TYPE_POWEROFF:
        reboot_res = pool.apply_async(execute_reboot_command)
    else:
        assert reboot_helper is not None, "A reboot function must be provided for power off reboot"
        reboot_res = pool.apply_async(execute_reboot_helper)

    logger.info('waiting for ssh to drop on {}'.format(hostname))
    res = localhost.wait_for(host=dut_ip,
                             port=SONIC_SSH_PORT,
                             state='absent',
                             search_regex=SONIC_SSH_REGEX,
                             delay=delay,
                             timeout=timeout,
                             module_ignore_errors=True)

    if res.is_failed or ('msg' in res and 'Timeout' in res['msg']):
        if reboot_res.ready():
            logger.error('reboot result: {} on {}'.format(reboot_res.get(), hostname))
        raise Exception('DUT {} did not shutdown'.format(hostname))

    if not wait_for_ssh:
        return

    # TODO: add serial output during reboot for better debuggability
    #       This feature requires serial information to be present in
    #       testbed information

    logger.info('waiting for ssh to startup on {}'.format(hostname))
    res = localhost.wait_for(host=dut_ip,
                             port=SONIC_SSH_PORT,
                             state='started',
                             search_regex=SONIC_SSH_REGEX,
                             delay=delay,
                             timeout=timeout,
                             module_ignore_errors=True)
    if res.is_failed or ('msg' in res and 'Timeout' in res['msg']):
      raise Exception('DUT {} did not startup'.format(hostname))

E Exception: DUT xxxxxx did not startup

delay = 10 dut_datetime = datetime.datetime(2021, 9, 8, 6, 24, 49) dut_ip = u'xxxxxx80' duthost = xxxxxx execute_reboot_command = <function execute_reboot_command at 0x7f99e191d3d0> execute_reboot_helper = <function execute_reboot_helper at 0x7f99e191d450> hostname = 'xxxxxx' localhost = <tests.common.devices.local.Localhost object at 0x7f99ea197710> pool = <multiprocessing.pool.ThreadPool object at 0x7f99e18d24d0> reboot_command = 'soft-reboot' reboot_ctrl = {'cause': 'soft-reboot', 'command': 'soft-reboot', 'test_reboot_cause_only': False, 'timeout': 300, ...} reboot_helper = None reboot_kwargs = None reboot_res = <multiprocessing.pool.ApplyResult object at 0x7f99e19074d0> reboot_type = 'soft' res = {u'failed': True, u'exception': u'WARNING: The below traceback may not be re...\.]+ Debian', u'path': None, u'port': 22}}, 'changed': False, u'elapsed': 300} timeout = 300 wait = 120 wait_for_ssh = True

common/reboot.py:162: Exception ------------------------------------ generated xml file: /data/sonic-mgmt-int/tests/logs/tr.xml ------------------------------------- ====================================================== short test summary info ====================================================== ERROR platform_tests/test_reboot.py::test_soft_reboot[xxxxxx] - Failed: Not all critical processes are healthy: {'lldp': ... FAILED platform_tests/test_reboot.py::test_soft_reboot[xxxxxx] - Exception: DUT xxxxxx did not startup =============================================== 1 failed, 1 error in 1811.18 seconds ================================================ INFO:root:Can not get Allure report URL. Please check logs xichenlin@211b3745deba:/data/sonic-mgmt-int/tests$ ./run_tests.sh -c platform_tests/test_reboot.py::test_soft_reboot -n xxxxxx -i ../ansible/str,../ansible/veos -f ../ansible/testbed.csv -e "--disable_loganalyzer" -u === Running tests in groups === /usr/local/lib/python2.7/dist-packages/ansible/parsing/vault/init.py:44: CryptographyDeprecationWarning: Python 2 is no longer supported by the Python core team. Support for it is now deprecated in cryptography, and will be removed in a future release. from cryptography.exceptions import InvalidSignature ======================================================== test session starts ======================================================== platform linux2 -- Python 2.7.17, pytest-4.6.5, py-1.9.0, pluggy-0.13.1 ansible: 2.8.12 rootdir: /data/sonic-mgmt-int/tests, inifile: pytest.ini plugins: forked-1.3.0, xdist-1.28.0, html-1.22.1, metadata-1.10.0, repeat-0.9.1, ansible-2.2.2 collected 1 item

platform_tests/test_reboot.py::test_soft_reboot[xxxxxx] ----------------------------------------------------------- live log call ----------------------------------------------------------- 07:20:24 init.pytest_runtest_call L0039 ERROR | Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/_pytest/python.py", line 1464, in runtest self.ihook.pytest_pyfunc_call(pyfuncitem=self) File "/usr/local/lib/python2.7/dist-packages/pluggy/hooks.py", line 286, in call return self._hookexec(self, self.get_hookimpls(), kwargs) File "/usr/local/lib/python2.7/dist-packages/pluggy/manager.py", line 93, in _hookexec return self._inner_hookexec(hook, methods, kwargs) File "/usr/local/lib/python2.7/dist-packages/pluggy/manager.py", line 87, in firstresult=hook.spec.opts.get("firstresult") if hook.spec else False, File "/usr/local/lib/python2.7/dist-packages/pluggy/callers.py", line 208, in _multicall return outcome.get_result() File "/usr/local/lib/python2.7/dist-packages/pluggy/callers.py", line 81, in get_result _reraise(ex) # noqa File "/usr/local/lib/python2.7/dist-packages/pluggy/callers.py", line 187, in _multicall res = hook_impl.function(args) File "/usr/local/lib/python2.7/dist-packages/_pytest/python.py", line 174, in pytest_pyfunc_call testfunction(**testargs) File "/data/sonic-mgmt-int/tests/platform_tests/test_reboot.py", line 138, in test_soft_reboot reboot_and_check(localhost, duthost, conn_graph_facts["device_conn"][duthost.hostname], xcvr_skip_list, reboot_type=REBOOT_TYPE_SOFT) File "/data/sonic-mgmt-int/tests/platform_tests/test_reboot.py", line 60, in reboot_and_check reboot(dut, localhost, reboot_type=reboot_type, reboot_helper=reboot_helper, reboot_kwargs=reboot_kwargs) File "/data/sonic-mgmt-int/tests/common/reboot.py", line 162, in reboot raise Exception('DUT {} did not startup'.format(hostname)) Exception: DUT xxxxxx did not startup

platform_tests/test_reboot.py::test_soft_reboot[xxxxxx] ERROR [100%]

============================================================== ERRORS =============================================================== __ ERROR at teardown of test_soft_reboot[xxxxxx] ___

duthosts = <tests.common.devices.duthosts.DutHosts object at 0x7fed973623d0>, enum_rand_one_per_hwsku_hostname = 'xxxxxx' conn_graph_facts = {'device_conn': {'xxxxxx': {'Ethernet0': {'peerdevice': u'xxxxxx', 'peerport': u'Ethernet0', 'speed'...ard', 'HwSku': u'Celestica-E1031-T48S4', 'ManagementGw': u'xxxxxx', 'ManagementIp': u'xxxxxx80/23', ...}}, ...} xcvr_skip_list = {'xxxxxx': []}

@pytest.fixture(scope="module", autouse=True)
def teardown_module(duthosts, enum_rand_one_per_hwsku_hostname, conn_graph_facts, xcvr_skip_list):
    duthost = duthosts[enum_rand_one_per_hwsku_hostname]
    yield

    logging.info("Tearing down: to make sure all the critical services, interfaces and transceivers are good")
    interfaces = conn_graph_facts["device_conn"][duthost.hostname]
  check_critical_processes(duthost, watch_secs=10)

conn_graph_facts = {'device_conn': {'xxxxxx': {'Ethernet0': {'peerdevice': u'xxxxxx', 'peerport': u'Ethernet0', 'speed'...ard', 'HwSku': u'Celestica-E1031-T48S4', 'ManagementGw': u'xxxxxx', 'ManagementIp': u'xxxxxx80/23', ...}}, ...} duthost = xxxxxx duthosts = <tests.common.devices.duthosts.DutHosts object at 0x7fed973623d0> enum_rand_one_per_hwsku_hostname = 'xxxxxx' interfaces = {'Ethernet0': {'peerdevice': u'xxxxxx', 'peerport': u'Ethernet0', 'speed': u'1000'}, 'Ethernet1': {'peerdevic... 'speed': u'1000'}, 'Ethernet11': {'peerdevice': u'xxxxxx', 'peerport': u'Ethernet11', 'speed': u'1000'}, ...} xcvr_skip_list = {'xxxxxx': []}

platform_tests/test_reboot.py:43:


dut = xxxxxx, watch_secs = 10

def check_critical_processes(dut, watch_secs=0):
    """
    @summary: check all critical processes. They should be all running.
              keep on checking every 5 seconds until watch_secs drops below 0.
    @param dut: The AnsibleHost object of DUT. For interacting with DUT.
    @param watch_secs: all processes should remain healthy for watch_secs seconds.
    """
    logging.info("Check all critical processes are healthy for {} seconds".format(watch_secs))
    while watch_secs >= 0:
        status, details = get_critical_processes_status(dut)
      pytest_assert(status, "Not all critical processes are healthy: {}".format(details))

E Failed: Not all critical processes are healthy: {'lldp': {'status': False, 'exited_critical_process': [], 'running_critical_process': []}, 'pmon': {'status': False, 'exited_critical_process': [], 'running_critical_process': []}, 'database': {'status': False, 'exited_critical_process': [], 'running_critical_process': []}, 'snmp': {'status': False, 'exited_critical_process': [], 'running_critical_process': []}, 'bgp': {'status': False, 'exited_critical_process': [], 'running_critical_process': []}, 'teamd': {'status': False, 'exited_critical_process': [], 'running_critical_process': []}, 'syncd': {'status': False, 'exited_critical_process': [], 'running_critical_process': []}, 'swss': {'status': False, 'exited_critical_process': [], 'running_critical_process': []}}

details = {'bgp': {'exited_critical_process': [], 'running_critical_process': [], 'status': False}, 'database': {'exited_critica...': [], 'status': False}, 'pmon': {'exited_critical_process': [], 'running_critical_process': [], 'status': False}, ...} dut = xxxxxx status = False watch_secs = 10

common/platform/processesutils.py:36: Failed ============================================================= FAILURES ============================================================== ____ test_softreboot[xxxxxx] ____

duthosts = <tests.common.devices.duthosts.DutHosts object at 0x7fed973623d0>, enum_rand_one_per_hwsku_hostname = 'xxxxxx' localhost = <tests.common.devices.local.Localhost object at 0x7fed95462050> conn_graph_facts = {'device_conn': {'xxxxxx': {'Ethernet0': {'peerdevice': u'xxxxxx', 'peerport': u'Ethernet0', 'speed'...ard', 'HwSku': u'Celestica-E1031-T48S4', 'ManagementGw': u'xxxxxx', 'ManagementIp': u'xxxxxx80/23', ...}}, ...} xcvr_skip_list = {'xxxxxx': []}

def test_soft_reboot(duthosts, enum_rand_one_per_hwsku_hostname, localhost, conn_graph_facts, xcvr_skip_list):
    """
    @summary: This test case is to perform soft reboot and check platform status
    """

    duthost = duthosts[enum_rand_one_per_hwsku_hostname]

    soft_reboot_supported = duthost.command('which soft-reboot', module_ignore_errors=True)["stdout"]
    if "" == soft_reboot_supported:
        pytest.skip("Soft-reboot is not supported on this DUT, skip this test case")

    if duthost.is_multi_asic:
        pytest.skip("Multi-ASIC devices not supporting soft reboot")
  reboot_and_check(localhost, duthost, conn_graph_facts["device_conn"][duthost.hostname], xcvr_skip_list, reboot_type=REBOOT_TYPE_SOFT)

conn_graph_facts = {'device_conn': {'xxxxxx': {'Ethernet0': {'peerdevice': u'xxxxxx', 'peerport': u'Ethernet0', 'speed'...ard', 'HwSku': u'Celestica-E1031-T48S4', 'ManagementGw': u'xxxxxx', 'ManagementIp': u'xxxxxx80/23', ...}}, ...} duthost = xxxxxx duthosts = <tests.common.devices.duthosts.DutHosts object at 0x7fed973623d0> enum_rand_one_per_hwsku_hostname = 'xxxxxx' localhost = <tests.common.devices.local.Localhost object at 0x7fed95462050> soft_reboot_supported = u'/usr/local/bin/soft-reboot' xcvr_skip_list = {'xxxxxx': []}

platform_tests/test_reboot.py:138:


platform_tests/test_reboot.py:60: in reboot_and_check reboot(dut, localhost, reboot_type=reboot_type, reboot_helper=reboot_helper, reboot_kwargs=reboot_kwargs)


duthost = xxxxxx, localhost = <tests.common.devices.local.Localhost object at 0x7fed95462050> reboot_type = 'soft', delay = 10, timeout = 300, wait = 120, wait_for_ssh = True, reboot_helper = None, reboot_kwargs = None

def reboot(duthost, localhost, reboot_type='cold', delay=10, \
    timeout=0, wait=0, wait_for_ssh=True, reboot_helper=None, reboot_kwargs=None):
    """
    reboots DUT
    :param duthost: DUT host object
    :param localhost:  local host object
    :param reboot_type: reboot type (cold, fast, warm)
    :param delay: delay between ssh availability checks
    :param timeout: timeout for waiting ssh port state change
    :param wait: time to wait for DUT to initialize
    :param reboot_helper: helper function to execute the power toggling
    :param reboot_kwargs: arguments to pass to the reboot_helper
    :return:
    """

    # pool for executing tasks asynchronously
    pool = ThreadPool()
    dut_ip = duthost.mgmt_ip
    hostname = duthost.hostname
    try:
        reboot_ctrl    = reboot_ctrl_dict[reboot_type]
        reboot_command = reboot_ctrl['command'] if reboot_type != REBOOT_TYPE_POWEROFF else None
        if timeout == 0:
            timeout = reboot_ctrl['timeout']
        if wait == 0:
            wait = reboot_ctrl['wait']
    except KeyError:
        raise ValueError('invalid reboot type: "{} for {}"'.format(reboot_type, hostname))

    def execute_reboot_command():
        logger.info('rebooting {} with command "{}"'.format(hostname, reboot_command))
        return duthost.command(reboot_command)

    def execute_reboot_helper():
        logger.info('rebooting {} with helper "{}"'.format(hostname, reboot_helper))
        return reboot_helper(reboot_kwargs)

    dut_datetime = duthost.get_now_time()
    DUT_ACTIVE.clear()

    if reboot_type != REBOOT_TYPE_POWEROFF:
        reboot_res = pool.apply_async(execute_reboot_command)
    else:
        assert reboot_helper is not None, "A reboot function must be provided for power off reboot"
        reboot_res = pool.apply_async(execute_reboot_helper)

    logger.info('waiting for ssh to drop on {}'.format(hostname))
    res = localhost.wait_for(host=dut_ip,
                             port=SONIC_SSH_PORT,
                             state='absent',
                             search_regex=SONIC_SSH_REGEX,
                             delay=delay,
                             timeout=timeout,
                             module_ignore_errors=True)

    if res.is_failed or ('msg' in res and 'Timeout' in res['msg']):
        if reboot_res.ready():
            logger.error('reboot result: {} on {}'.format(reboot_res.get(), hostname))
        raise Exception('DUT {} did not shutdown'.format(hostname))

    if not wait_for_ssh:
        return

    # TODO: add serial output during reboot for better debuggability
    #       This feature requires serial information to be present in
    #       testbed information

    logger.info('waiting for ssh to startup on {}'.format(hostname))
    res = localhost.wait_for(host=dut_ip,
                             port=SONIC_SSH_PORT,
                             state='started',
                             search_regex=SONIC_SSH_REGEX,
                             delay=delay,
                             timeout=timeout,
                             module_ignore_errors=True)
    if res.is_failed or ('msg' in res and 'Timeout' in res['msg']):
      raise Exception('DUT {} did not startup'.format(hostname))

E Exception: DUT xxxxxx did not startup

delay = 10 dut_datetime = datetime.datetime(2021, 9, 8, 7, 14, 57) dut_ip = u'xxxxxx80' duthost = xxxxxx execute_reboot_command = <function execute_reboot_command at 0x7fed8ccf43d0> execute_reboot_helper = <function execute_reboot_helper at 0x7fed8ccf4450> hostname = 'xxxxxx' localhost = <tests.common.devices.local.Localhost object at 0x7fed95462050> pool = <multiprocessing.pool.ThreadPool object at 0x7fed8cb60190> reboot_command = 'soft-reboot' reboot_ctrl = {'cause': 'soft-reboot', 'command': 'soft-reboot', 'test_reboot_cause_only': False, 'timeout': 300, ...} reboot_helper = None reboot_kwargs = None reboot_res = <multiprocessing.pool.ApplyResult object at 0x7fed8cac3910> reboot_type = 'soft' res = {u'failed': True, u'exception': u'WARNING: The below traceback may not be re...\.]+ Debian', u'path': None, u'port': 22}}, 'changed': False, u'elapsed': 301} timeout = 300 wait = 120 wait_for_ssh = True

common/reboot.py:162: Exception ------------------------------------ generated xml file: /data/sonic-mgmt-int/tests/logs/tr.xml ------------------------------------- ====================================================== short test summary info ====================================================== ERROR platform_tests/test_reboot.py::test_soft_reboot[xxxxxx] - Failed: Not all critical processes are healthy: {'lldp': ... FAILED platform_tests/test_reboot.py::test_soft_reboot[xxxxxx] - Exception: DUT xxxxxx did not startup =============================================== 1 failed, 1 error in 1890.96 seconds ================================================ INFO:root:Can not get Allure report URL. Please check logs