Closed guvenc closed 2 years ago
The mellanox bug talks about a workaround:
The option we are considering - explicitly disable the CQE compression if rx_burst_vec is engaged
So tried to start dp_service by passing the following parameters to mlx5 DPDK driver:
dp_service -l 0,1 -a 03:00.0,class=rxq_cqe_comp_en=0,rx_vec_en=1 -a 03:00.1,class=rxq_cqe_comp_en=0,rx_vec_en=1 --
This causes the VFs not to be created in dp_service so this doesn't seem to help.
Relevant documentation: https://doc.dpdk.org/guides/nics/mlx5.html#rx-burst-functions https://doc.dpdk.org/guides/nics/mlx5.html#driver-options
After a long debug session of the dpdk mlx5 driver, figured out the way how the workaround can be applied to our environment. For ARM-Based Bluefield-2 Cards, which do not really have real VFs in our use case but only one VF-like representor port of the baremetal server. The following needs to be added to dp_service parameters: Special "VF like" representor port of the Bluefield-2 card is indexed in mlx5 driver as "-1"
-a 0000:03:00.0,class=rxq_cqe_comp_en=0,rx_vec_en=1,representor=pf[0]vf[-1]
For Hypervisor use cases on intel platform (Depending on PCI Bus and number of VFs, this line might change)
-a 0000:8a:00.0,class=rxq_cqe_comp_en=0,rx_vec_en=1,representor=pf[0]vf[1] -a 0000:8a:00.1,class=rxq_cqe_comp_en=0,rx_vec_en=1
After applying these parameters to mlx5 driver, the crash was under same conditions not observable anymore.
Mellanox workaround implemented with #70 This can not be fixed in dp-service.
dp_service may crash when iperf3 sends/receives not offloaded traffic greater than 7 Gbits. This is due to a bug in dpdk mellanox driver documented here: https://bugs.dpdk.org/show_bug.cgi?id=334
Needs an investigation whether a workaround provided by Mellanox can be used.