openvswitch / ovs-issues

Issue tracker repo for Open vSwitch
10 stars 3 forks source link

OvS with dpdkvhostuser interfaces crashes while HW Offload is enabled - Intel E810 #267

Open GrzegorzKania-lime opened 1 year ago

GrzegorzKania-lime commented 1 year ago

Hey, during performance testing of mine OvS deployment, I've stumbled across the problem. The type of test that I am doing is RFC2544 for small packet sizes, but for many L3 flows (i.e. 16k).

Here is my general OvS configuration:

ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0xFF00000000
ovs-vsctl set Open_vSwitch . other_config:vhost-iommu-support="true"
ovs-vsctl set Open_vSwitch . other_config:dpdk-init="true"
ovs-vsctl set Open_vSwitch . other_config:dpdk-alloc-mem=4096,4096
ovs-vsctl set Open_vSwitch . other_config:per-port-memory="true"
ovs-vsctl set Open_vSwitch . other_config:pmd-auto-lb="true"
ovs-vsctl set Open_vswitch . other_config:pmd-auto-lb-load-threshold="70"
ovs-vsctl set Open_vswitch . other_config:pmd-auto-lb-improvement-threshold="10"
ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-mem=4096,4096
ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x30000000000000
ovs-vsctl set Open_vSwitch . other_config:n-dpdk-rxqs=8
ovs-vsctl set Open_vSwitch . other_config:n-dpdk-txqs=8

Bridge config:

ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev
ovs-vsctl add-port br0 phy0 tag=33 -- set Interface phy0 type=dpdk \
      options:dpdk-devargs=0000:ca:00.0 ofport_request=5
ovs-vsctl add-port br0 phy1 tag=66 -- set Interface phy1 type=dpdk \
      options:dpdk-devargs=0000:ca:00.1 ofport_request=6
ovs-vsctl add-port br0 vhost-user-1 tag=33 -- set Interface vhost-user-1 type=dpdkvhostuser ofport_request=1
ovs-vsctl add-port br0 vhost-user-2 tag=66 -- set Interface vhost-user-2 type=dpdkvhostuser ofport_request=2
ovs-vsctl add-port br0 vhost-user-3 tag=33 -- set Interface vhost-user-3 type=dpdkvhostuser ofport_request=3
ovs-vsctl add-port br0 vhost-user-4 tag=66 -- set Interface vhost-user-4 type=dpdkvhostuser ofport_request=4
ovs-vsctl set Interface phy0 options:n_txq_desc=512 
ovs-vsctl set Interface phy1 options:n_txq_desc=512 
ovs-vsctl set Interface phy0 options:n_rxq_desc=512 
ovs-vsctl set Interface phy1 options:n_rxq_desc=512 
ovs-vsctl set interface phy0 options:n_rxq=4 other_config:pmd-rxq-affinity="0:32,1:33,2:34,3:35"
ovs-vsctl set interface phy1 options:n_rxq=4 other_config:pmd-rxq-affinity="0:36,1:37,2:38,3:39"
ovs-vsctl set interface vhost-user-1 options:n_rxq=4 other_config:pmd-rxq-affinity="0:32,1:33,2:34,3:35"
ovs-vsctl set interface vhost-user-2 options:n_rxq=4 other_config:pmd-rxq-affinity="0:36,1:37,2:38,3:39"
ovs-vsctl set interface vhost-user-3 options:n_rxq=4 other_config:pmd-rxq-affinity="0:32,1:33,2:34,3:35"
ovs-vsctl set interface vhost-user-4 options:n_rxq=4 other_config:pmd-rxq-affinity="0:36,1:37,2:38,3:39"
ovs-ofctl del-flows br0

All dpdkvhostuser interfaces are connected to l3fwd application, that is responsible for looping back traffic to the OvS and further to traffic generators. Nothing happens if OvS set: ovs-vsctl set Open_vSwitch . other_config:hw-offload=false but, switching to: ovs-vsctl set Open_vSwitch . other_config:hw-offload=true crashes the OvS, along with removing all dpdkvhostuser interfaces from the bridge.

While doing the tests, the first two iterations of binary search seems to work with no problem, but in the third, OvS crashes with the following dump:

2022-10-27T12:50:03.228Z|00001|dpdk(hw_offload15)|ERR|ice_check_fdir_programming_status(): Failed to remove FDIR rule.
2022-10-27T12:50:03.229Z|00002|dpdk(hw_offload15)|ERR|ice_flow_destroy(): Failed to destroy flow
2022-10-27T12:50:03.229Z|00027|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event
2022-10-27T12:50:03.229Z|00003|netdev_offload_dpdk(hw_offload15)|ERR|Failed flow: phy1/phy1: flow destroy 1 ufid 8f83494e-0ef4-439d-9ee6-04450c6705d3
2022-10-27T12:50:03.229Z|00004|dpdk(hw_offload15)|ERR|ice_check_fdir_programming_status(): Failed to remove FDIR rule.
2022-10-27T12:50:03.229Z|00005|dpdk(hw_offload15)|ERR|ice_flow_destroy(): Failed to destroy flow
2022-10-27T12:50:03.229Z|00028|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event
2022-10-27T12:50:03.229Z|00006|netdev_offload_dpdk(hw_offload15)|ERR|Failed flow: phy0/phy0: flow destroy 0 ufid 0a439759-0784-4c90-8a02-d0f86e49766a

It looks like the card cannot delete flows from the inner offload table. The issue is not reproducible without dpdkvhostuser interfaces (simple phy0 <-> phy1 traffic loopback will not show the issue).

igsilya commented 1 year ago

Hi, do you have a coredump or a stack trace for the crash? Otherwise it's hard to tell what went wrong.

If you suspect a driver bug, it's better to open a bug for DPDK instead.

wangjun0728 commented 6 months ago

Hello, I have encountered a similar issue on the E810 network card, which was discovered upon opening the new version. It appears to be caused by the newly introduced checksum offload.

2024-03-04T10:10:45.687Z|00274|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event 2024-03-04T10:16:05.137Z|00275|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event 2024-03-04T10:16:06.199Z|00276|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event 2024-03-04T10:16:08.247Z|00277|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event 2024-03-04T10:16:12.279Z|00278|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event 2024-03-04T10:16:16.024Z|00483|netdev_linux|WARN|error receiving Ethernet packet on br-int: Invalid argument 2024-03-04T10:16:16.024Z|00484|dpif_netdev|ERR|error receiving data from br-int: Invalid argument

wangjun0728 commented 6 months ago

On Mellanox network cards, everything is normal, but there are issues with the E810. I'm using DPDK version 22.11, and it seems to be a problem on the DPDK side. @igsilya