sonic-net / sonic-mgmt

Configuration management examples for SONiC
Other
201 stars 727 forks source link

qos/qos_sai_base.py selects incorrect ports for saitests in ptf. #5578

Closed rraghav-cisco closed 1 year ago

rraghav-cisco commented 2 years ago

The issue is from the following code in tests/qos/qos_sai_base.py 443-449:

        #TODO: Randomize port selection
        dstPort = dstPorts[0] if dst_port_ids else testPortIds[dstPorts[0]]
        dstVlan = testPortIps[dstPort]['vlan_id'] if 'vlan_id' in testPortIps[dstPort] else None
        dstPort2 = dstPorts[1] if dst_port_ids else testPortIds[dstPorts[1]]
        dstVlan2 = testPortIps[dstPort2]['vlan_id'] if 'vlan_id' in testPortIps[dstPort2] else None
        dstPort3 = dstPorts[2] if dst_port_ids else testPortIds[dstPorts[2]]
        dstVlan3 = testPortIps[dstPort3]['vlan_id'] if 'vlan_id' in testPortIps[dstPort3] else None

The dstPort is choosen as the first portchannel in the case of T1-64-lag. This is passed to the saitests scripts which will run in the PTF container. The saitests will use the dstPort to check for traffic returned from the DUT. This works as long as the PortChannel has 1 member.

However, If the PortChannel has 2 members, it can send back traffic in either of the member ports. This causes the saitests in the PTF container to fail, if the script is expecting the packets in one port, and the traffic actually came in another port due to PortChannel's load balancing. This makes the saitests scripts fail intermittently. The affected tests are: tests/qos/test_qos_sai.py::testQosSaiDscpToPgMapping tests/qos/test_qos_sai.py::testQosSaiDwrr tests/qos/test_qos_sai.py::testQosSaiDwrrWeightChange

For example, testQosSaiDwrrWeightChange does the following:

 testParams.update({
            "ecn": qosConfig["wrr_chg"]["ecn"],
            "dst_port_id": dutConfig["testPorts"]["dst_port_id"],
            "dst_port_ip": dutConfig["testPorts"]["dst_port_ip"],
            "src_port_id": dutConfig["testPorts"]["src_port_id"],
            "src_port_ip": dutConfig["testPorts"]["src_port_ip"],
            "src_port_vlan": dutConfig["testPorts"]["src_port_vlan"],
            "q0_num_of_pkts": qosConfig["wrr_chg"]["q0_num_of_pkts"],
            "q1_num_of_pkts": qosConfig["wrr_chg"]["q1_num_of_pkts"],
            "q2_num_of_pkts": qosConfig["wrr_chg"]["q2_num_of_pkts"],
            "q3_num_of_pkts": qosConfig["wrr_chg"]["q3_num_of_pkts"],
            "q4_num_of_pkts": qosConfig["wrr_chg"]["q4_num_of_pkts"],
            "q5_num_of_pkts": qosConfig["wrr_chg"]["q5_num_of_pkts"],
            "q6_num_of_pkts": qosConfig["wrr_chg"]["q6_num_of_pkts"],
            "limit": qosConfig["wrr_chg"]["limit"],
            "pkts_num_leak_out": qosConfig[portSpeedCableLength]["pkts_num_leak_out"],
            "hwsku":dutTestParams['hwsku'],
            "topo": dutTestParams["topo"]
        })
        self.runPtfTest(
            ptfhost, testCase="sai_qos_tests.WRRtest", testParams=testParams
        )

In the above code, if dutConfig['testPorts']["dst_port_id"] belongs to a multi-link PortChannel, the test might fail, since the PortChannel may decide to send back the returning packets in another member port instead. This causes the test to be intermittently failing.

oleksandrKovtunenko commented 2 years ago

in code https://github.com/Azure/sonic-mgmt/blob/master/tests/qos/qos_sai_base.py#L508 seems we try to exclude LAG/PortChannel ports from using in our tests Does it still select LAG ports in T1-64-lag topology ? it seems it removes LAG ports in case t0 topology or Mellanox device

 # LAG ports in T1 TOPO need to be removed in Mellanox devices
   if topo in self.SUPPORTED_T0_TOPOS or isMellanoxDevice(duthost):
oleksandrKovtunenko commented 2 years ago

ok need change if conditions in case other platform not Mellanox

rraghav-cisco commented 2 years ago

[~[oleksandrKovtunenko] I don't understand your comment. T1-64-lag has only LAG ports. No other free Ethernet ports.

neethajohn commented 1 year ago

There is a logic in sai_qos_tests.py to pick the correct dst port among the lag members before the actual test run. https://github.com/sonic-net/sonic-mgmt/blob/55e3bdd9dab07e56f5a2680cf252fc4a1e56542b/tests/saitests/py3/sai_qos_tests.py#L131 Sample usage in one of the tests: https://github.com/sonic-net/sonic-mgmt/blob/55e3bdd9dab07e56f5a2680cf252fc4a1e56542b/tests/saitests/py3/sai_qos_tests.py#L2349