Closed rraghav-cisco closed 1 year ago
in code https://github.com/Azure/sonic-mgmt/blob/master/tests/qos/qos_sai_base.py#L508 seems we try to exclude LAG/PortChannel ports from using in our tests Does it still select LAG ports in T1-64-lag topology ? it seems it removes LAG ports in case t0 topology or Mellanox device
# LAG ports in T1 TOPO need to be removed in Mellanox devices
if topo in self.SUPPORTED_T0_TOPOS or isMellanoxDevice(duthost):
ok need change if conditions in case other platform not Mellanox
[~[oleksandrKovtunenko] I don't understand your comment. T1-64-lag has only LAG ports. No other free Ethernet ports.
There is a logic in sai_qos_tests.py to pick the correct dst port among the lag members before the actual test run. https://github.com/sonic-net/sonic-mgmt/blob/55e3bdd9dab07e56f5a2680cf252fc4a1e56542b/tests/saitests/py3/sai_qos_tests.py#L131 Sample usage in one of the tests: https://github.com/sonic-net/sonic-mgmt/blob/55e3bdd9dab07e56f5a2680cf252fc4a1e56542b/tests/saitests/py3/sai_qos_tests.py#L2349
The issue is from the following code in tests/qos/qos_sai_base.py 443-449:
The dstPort is choosen as the first portchannel in the case of T1-64-lag. This is passed to the saitests scripts which will run in the PTF container. The saitests will use the dstPort to check for traffic returned from the DUT. This works as long as the PortChannel has 1 member.
However, If the PortChannel has 2 members, it can send back traffic in either of the member ports. This causes the saitests in the PTF container to fail, if the script is expecting the packets in one port, and the traffic actually came in another port due to PortChannel's load balancing. This makes the saitests scripts fail intermittently. The affected tests are: tests/qos/test_qos_sai.py::testQosSaiDscpToPgMapping tests/qos/test_qos_sai.py::testQosSaiDwrr tests/qos/test_qos_sai.py::testQosSaiDwrrWeightChange
For example, testQosSaiDwrrWeightChange does the following:
In the above code, if dutConfig['testPorts']["dst_port_id"] belongs to a multi-link PortChannel, the test might fail, since the PortChannel may decide to send back the returning packets in another member port instead. This causes the test to be intermittently failing.