sonic-net / sonic-mgmt

Configuration management examples for SONiC
Other
194 stars 714 forks source link

[Bug]: [dualtor | mux] The test dualtor/test_orchagent_mac_move.py::test_mac_move fail due to PR #10657 #12795

Open weiguo-nvidia opened 4 months ago

weiguo-nvidia commented 4 months ago

Issue Description

Issue

Testcase dualtor/test_orchagent_mac_move.py::test_mac_move fail

Analysis

The PR https://github.com/sonic-net/sonic-mgmt/pull/10657 added mux to the critical service list. So after config reload, the script will check if mux is in Running status. If not, the case will fail

tests/common/devices/multi_asic.py

        # NOTE: Add mux to critical services for dualtor
        if (
            "DEVICE_METADATA" in config_facts and
            "localhost" in config_facts["DEVICE_METADATA"] and
            "subtype" in config_facts["DEVICE_METADATA"]["localhost"] and
                config_facts["DEVICE_METADATA"]["localhost"]["subtype"] == "DualToR"
        ):
            service_list.append("mux")
"localhost": {
    "bgp_asn": "64601",
    "buffer_model": "traditional",
    "cloudtype": "Public",
    "default_bgp_status": "up",
    "default_pfcwd_status": "disable",
    "deployment_id": "1",
    "docker_routing_config_mode": "separated",
    "hostname": "r-tigon-20",
    "hwsku": "Mellanox-SN4600C-C64",
    "mac": "1c:34:da:c9:60:00",
    "peer_switch": "switch_hostname",
    "platform": "x86_64-mlnx_msn4600c-r0",
    "region": "None",
    "subtype": "DualToR",            <<<<<<
    "synchronous_mode": "enable",
    "timezone": "UTC",
    "type": "ToRRouter",
    "yang_config_validation": "disable"
}

But in config_db.json, when the mux process state is set to always_disabled, it will not run on the DUT. When checking the mux process state from the critical service list, because it is not running, so the case fail

"FEATURE": {
    "mux": {
        "auto_restart": "disabled",
        "delayed": "False",
        "has_global_scope": "True",
        "has_per_asic_scope": "False",
        "high_mem_alert": "disabled",
        "state": "always_disabled",          <<<<<<
        "support_syslog_rate_limit": "true"
    }
}

@lolyu, may I ask if the same case can pass in Microsoft? If yes, how is it handled?

Results you see

log

DEBUG    tests.common.devices.base:base.py:67 /root/mars/workspace/sonic-mgmt/tests/common/devices/sonic.py::critical_services_status#484: [r-tigon-20] AnsibleModule::command, args=["docker ps --filter status=running --format \\{\\{.Names\\}\\}"], kwargs={}
DEBUG    tests.common.devices.base:base.py:104 /root/mars/workspace/sonic-mgmt/tests/common/devices/sonic.py::critical_services_status#484: [r-tigon-20] AnsibleModule::command Result => {"cmd": ["docker", "ps", "--filter", "status=running", "--format", "{{.Names}}"], "stdout": "what-just-happened\nsnmp\nmgmt-framework\nlldp\ngnmi\nradv\npmon\ndhcp_relay\nsyncd\nbgp\nteamd\nswss\neventd\ndatabase", "stderr": "", "rc": 0, "start": "2024-05-04 08:37:56.276825", "end": "2024-05-04 08:37:56.309112", "delta": "0:00:00.032287", "changed": true, "invocation": {"module_args": {"_raw_params": "docker ps --filter status=running --format \\{\\{.Names\\}\\}", "warn": true, "_uses_shell": false, "stdin_add_newline": true, "strip_empty_ends": true, "argv": null, "chdir": null, "executable": null, "creates": null, "removes": null, "stdin": null}}, "stdout_lines": ["what-just-happened", "snmp", "mgmt-framework", "lldp", "gnmi", "radv", "pmon", "dhcp_relay", "syncd", "bgp", "teamd", "swss", "eventd", "database"], "stderr_lines": [], "_ansible_no_log": false, "failed": false}
DEBUG    root:sonic.py:499 Status of critical services: {'pmon': True, 'snmp': True, 'lldp': True, 'database': True, 'mux': False, 'bgp': True, 'swss': True, 'syncd': True, 'teamd': True}
DEBUG    tests.common.utilities:utilities.py:146 critical_services_fully_started is False, wait 20 seconds and check again
DEBUG    tests.common.utilities:utilities.py:151 critical_services_fully_started is still False after 420 seconds, exit with False

Results you expected to see

Case pass

Is it platform specific

generic

Relevant log output

No response

Output of show version

No response

Attach files (if any)

No response

bingwang-ms commented 4 months ago

@lolyu Can you please share some update on this issue?

lolyu commented 2 months ago

I have one question, if it is a dualtor testbed, why the mux service is disabled?