sonic-net / sonic-mgmt

Configuration management examples for SONiC
Other
201 stars 727 forks source link

[Snappi]: snappi_tests/multidut: test_global_pause case failure on 8800 cross LC #14807

Open sdszhang opened 1 month ago

sdszhang commented 1 month ago

Issue Description

When running snappi tests on cross LC scenario, packet drops are observed.

Failed test cases: test_global_pause

snappi_tests/multidut/pfc/test_multidut_global_pause_with_snappi.py::test_global_pause[multidut_port_info0]
-------------------------------- live log call ---------------------------------
FAILED                                                                   [100%]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> traceback >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
......
            exp_bg_flow_rx_pkts = bg_flow_config["flow_rate_percent"] / 100.0 * speed_gbps \
                * 1e9 * bg_flow_config["flow_dur_sec"] / 8.0 / bg_flow_config["flow_pkt_size"]
            deviation = (rx_frames - exp_bg_flow_rx_pkts) / float(exp_bg_flow_rx_pkts)

>           pytest_assert(tx_frames == rx_frames,
                          "{} should not have any dropped packet".format(metric.name))
E           Failed: Background Flow Prio 1 should not have any dropped packet

bg_flow_config = {'flow_delay_sec': 4, 'flow_dur_sec': 19, 'flow_name': 'Background Flow', 'flow_pkt_count': None, ...}
deviation  = -0.22155717614035086
exp_bg_flow_rx_pkts = 83496093.75
flow_metrics = <snappi.snappi.FlowMetricIter object at 0x7fa18a1bd580>
metric     = <snappi.snappi.FlowMetric object at 0x7fa18899a780>
rx_frames  = 64996935
snappi_extra_params = <tests.common.snappi_tests.snappi_test_params.SnappiTestParams object at 0x7fa188da3460>
speed_gbps = 400
tolerance  = 0.05
tx_frames  = 81896551

Results you see

FAILED

Results you expected to see

PASS

Is it platform specific

generic

Relevant log output

Counters on ingress port. TX drop seen on backplane port.

admin@xxxx-8800-lc4-1:~$ show interface count -d all | grep -E "256|BP640"
   Ethernet256        U  7,607,604,667  27512.09 MB/s     55.02%         0         1         0            214   2908.91 KB/s      0.01%         0            0         0
Ethernet-BP640        U        730,952   1328.96 KB/s      0.01%         0        99         0  6,884,207,638  23117.84 MB/s     92.47%         0  724,543,303         0
admin@xxxx-8800-lc4-1:~$ show interface count -d all | grep -E "256|BP640"
   Ethernet256        U  7,834,882,007  30227.30 MB/s     60.45%         0         1         0            215   2947.28 KB/s      0.01%         0            0         0
Ethernet-BP640        U        731,908   1329.18 KB/s      0.01%         0        99         0  7,075,999,864  25525.13 MB/s    102.10%         0  755,795,349         0
admin@svcstr2-8800-lc4-1:~$

Output of show version

No response

Attach files (if any)

No response

sdszhang commented 1 month ago

@rraghav-cisco can you look into this? Seems we need the udp flow change here too.

rraghav-cisco commented 3 weeks ago

The udp part is already handled in common/snappi_tests/traffic_generation.py, which is used by this script. I suspect the problem here might be the fact that the backplane port is 200G compared to the front-panel port which is 400G. So the traffic needs to be halved for this test.

I have had success in this script with the following parameters in snappi_tests/multidut/pfc/files/multidut_helper.py:

dut_port_config = []
PAUSE_FLOW_NAME = 'Pause Storm'
TEST_FLOW_NAME = 'Test Flow'
TEST_FLOW_AGGR_RATE_PERCENT = 22
BG_FLOW_NAME = 'Background Flow'
BG_FLOW_AGGR_RATE_PERCENT = 22
data_flow_pkt_size = 1024
DATA_FLOW_DURATION_SEC = 15
data_flow_delay_sec = 1
SNAPPI_POLL_DELAY_SEC = 2
PAUSE_FLOW_DUR_BASE_SEC = data_flow_delay_sec + DATA_FLOW_DURATION_SEC
TOLERANCE_THRESHOLD = 0.05
CONTINUOUS_MODE = -5
ANSIBLE_POLL_DELAY_SEC = 4

As we can see, the rate_percent(22) are both half of the actual code(44). I am trying to figure out how to change the above based on the platform type.