testQosSaiLossyQueue fails with the following exception:
FAIL: sai_qos_tests.LossyQueueTest
----------------------------------------------------------------------
Traceback (most recent call last):
File "saitests/py3/sai_qos_tests.py", line 3430, in runTest
assert(recv_counters[cntr] <= recv_counters_base[cntr] + COUNTER_MARGIN)
AssertionError
It indicates that we receive RX_DRP packets when we were filling up the VOQ:
recv_counters_base: 321813, recv_counters: 533016
The reason we see RX_DRPs is because the port-channel goes down while we're sending the packets.
This results in the packet not having a destination and is therefor dropped.
The reason the port-channel goes down is because this test requires disabling TX on the egress port (a member of a port-channel):
self.sai_thrift_port_tx_disable(self.dst_client, asic_type, [dst_port_id])https://github.com/sonic-net/sonic-mgmt/blob/202205/tests/saitests/py3/sai_qos_tests.py#L3386
This will result in the TX LACP packets to stop egressing, so after 3 LACP packets are missed (60-90s) on the server side the LAG is torn down.
I timed how long it takes the test to send all it's packets (2,396,544) to fill up the VOQ:
Sending Packets 2024-02-09 22:49:53.234339
Packets Finished 2024-02-09 22:55:25.925242
It takes over 5 minutes to send these packets so the LAG has plenty of time to LACP timeout.
I'm able to see this issue just by disabling TX and waiting:
(Pdb) self.sai_thrift_port_tx_disable(self.dst_client, asic_type, [dst_port_id])
...
Feb 9 22:35:22.592837 cmp314-3 NOTICE swss0#orchagent: :- updatePortOperStatus: Port PortChannel102 oper state set from up to down
testQosSaiLossyQueue fails with the following exception:
It indicates that we receive
RX_DRP
packets when we were filling up the VOQ:recv_counters_base: 321813, recv_counters: 533016
The reason we see
RX_DRP
s is because the port-channel goes down while we're sending the packets. This results in the packet not having a destination and is therefor dropped.The reason the port-channel goes down is because this test requires disabling TX on the egress port (a member of a port-channel):
self.sai_thrift_port_tx_disable(self.dst_client, asic_type, [dst_port_id])
https://github.com/sonic-net/sonic-mgmt/blob/202205/tests/saitests/py3/sai_qos_tests.py#L3386 This will result in the TX LACP packets to stop egressing, so after 3 LACP packets are missed (60-90s) on the server side the LAG is torn down.I timed how long it takes the test to send all it's packets (2,396,544) to fill up the VOQ: Sending Packets 2024-02-09 22:49:53.234339 Packets Finished 2024-02-09 22:55:25.925242 It takes over 5 minutes to send these packets so the LAG has plenty of time to LACP timeout.
I'm able to see this issue just by disabling TX and waiting: