sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
730 stars 1.4k forks source link

[Broadcom-DNX] Intermittent lossless packet drop seen with Pause storm on egress port and latest MMU buffer settings. #19648

Open amitpawar12 opened 2 months ago

amitpawar12 commented 2 months ago

Description

Intermittent lossless packet drop seen on DUT with PFCWD and Credit watchdog disabled and Pause storm sent on the egress port of the DUT.

This is seen with new MMU Buffer setting implemented via #18239

Tests: tests/snappi_tests/multidut/pfc/test_lossless_response_to_throttling_pause_storms.py tests/snappi_tests/multidut/pfc/test_m2o_oversubscribe_lossy.py tests/snappi_tests/multidut/pfc/test_m2o_fluctuating_lossless.py

Steps to reproduce the issue:

Describing the throttling pause storm test case here:

  1. Disable PFC and credit-watchdog on the DUT.
  2. Have 2 ingress and 1 egress on the DUT.
  3. Each ingress is sending 35% lossless priority 3 and 4 traffic + 25% of lossy priority 1, 2 traffic. Thus each link sends 60% of traffic.
  4. Both lossy and lossless traffic run for 20 seconds and start at time T0.
  5. Send pause frames to the DUT on the egress link with delay of 5 seconds at T5 for duration of 10 seconds.
  6. Stop pause frames at Time T15.
  7. Check for the Tx and Rx lossless and lossy frames.

Tagging @vmittal-msft for the issue.

Describe the results you received:

For 100Gbps links, it is seen that there are small amount of packet drops seen. At times, this test will pass and at times, there are packet drops anywhere between 0-100 packets (out of 7M lossless packets).

Describe the results you expected:

No drops for priority 3 and 4 traffic (i.e Tx == Rx packets).

Output of show version:

(paste your output here)

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

amitpawar12 commented 2 months ago

@vmittal-msft - These losses are still seen intermittently with PFC testcases.

Testcase - snappi_tests/multidut/pfc/test_multidut_pfc_pause_lossy_with_snappi.py

flow_metrics = <snappi.snappi.FlowMetricIter object at 0x7f179f79d050>, speed_gbps = 400, tolerance = 0.05
snappi_extra_params = <tests.common.snappi_tests.snappi_test_params.SnappiTestParams instance at 0x7f16ffb4cc80>

    def verify_background_flow(flow_metrics,
                               speed_gbps,
                               tolerance,
                               snappi_extra_params):
----------- curtailed output ------------
            pytest_assert(tx_frames == rx_frames,
>                         "{} should not have any dropped packet".format(metric.name))
E           Failed: Background Flow Prio 4 should not have any dropped packet

deviation  = -0.019157099412587413
exp_bg_flow_rx_pkts = 558593750.0
flow_metrics = <snappi.snappi.FlowMetricIter object at 0x7f179f79d050>
metric     = <snappi.snappi.FlowMetric object at 0x7f179d0d2280>
rx_frames  = 547892714 <<< Difference of 6 packets for 547 million packets.
speed_gbps = 400
tolerance  = 0.05
tx_frames  = 547892720  <<<

Testcase: snappi_tests/multidut/pfc/files/m2o_oversubscribe_lossy_helper.py

rows = <snappi.snappi.FlowMetricIter object at 0x7f179c92fcd0>, test_flow_name = 'Test Flow', bg_flow_name = 'Background Flow'
flag = {'Background Flow': {'loss': '0'}, 'Test Flow': {'loss': '16'}}

    def verify_m2o_oversubscribtion_results(rows,
                                            test_flow_name,
                                            bg_flow_name,
                                            flag):
----------- curtailed output --------------
                            pytest_assert(tx_frames == rx_frames,
>                                         '{} should not have any dropped packet'.format(row.name))
E                                         Failed: Background Flow 1 -> 0 should not have any dropped packet

bg_flow_name = 'Background Flow'
criteria   = {'loss': '0'}
flag       = {'Background Flow': {'loss': '0'}, 'Test Flow': {'loss': '16'}}
flow_type  = 'Background Flow'
row        = <snappi.snappi.FlowMetric object at 0x7f179c92fe60>
rows       = <snappi.snappi.FlowMetricIter object at 0x7f179c92fcd0>
rx_frames  = 14966400
test_flow_name = 'Test Flow'
tx_frames  = 14966475
amitpawar12 commented 2 months ago

This issue is still seen even with changes in MMU buffer settings as per PR #19653 .

Thanks, -A