openvswitch / ovs-issues

Issue tracker repo for Open vSwitch
10 stars 3 forks source link

After using the new version, there are errors with Intel's 82599 and E810 network cards, while Mellanox network cards are functioning normally. #321

Open wangjun0728 opened 4 months ago

wangjun0728 commented 4 months ago

The DPDK version is 22.11. Currently, it appears that the DPDK errors are occurring due to the new version's checksum offload. Mellanox network cards seem to be operating normally. However, both E810 and 82599 network cards are displaying different error messages.

E810: {bus_info="bus_name=pci, vendor_id=8086, device_id=159b", driver_name=net_ice, if_descr="DPDK 22.11.1 net_ice", if_type="6", link_speed="25Gbps", max_hash_mac_addrs="0", max_mac_addrs="64", max_rx_pktlen="1618", max_rx_queues="256", max_tx_queues="256", max_vfs="0", max_vmdq_pools="0", min_rx_bufsize="1024", n_rxq="2", n_txq="5", numa_id="1", port_no="1", rx-steering=rss, rx_csum_offload="true", tx_geneve_tso_offload="false", tx_ip_csum_offload="true", tx_out_ip_csum_offload="true", tx_out_udp_csum_offload="true", tx_sctp_csum_offload="true", tx_tcp_csum_offload="true", tx_tcp_seg_offload="false", tx_udp_csum_offload="true", tx_vxlan_tso_offload="false"}

error: 2024-03-04T10:57:01.102Z|00018|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event 2024-03-04T10:57:01.105Z|00019|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event 2024-03-04T10:57:01.113Z|00020|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event 2024-03-04T10:57:01.167Z|00021|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event 2024-03-04T10:57:01.278Z|00022|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event 2024-03-04T10:57:01.599Z|00023|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event

82599: {bus_info="bus_name=pci, vendor_id=8086, device_id=10fb", driver_name=net_ixgbe, if_descr="DPDK 22.11.1 net_ixgbe", if_type="6", link_speed="10Gbps", max_hash_mac_addrs="4096", max_mac_addrs="127", max_rx_pktlen="1618", max_rx_queues="128", max_tx_queues="64", max_vfs="0", max_vmdq_pools="64", min_rx_bufsize="1024", n_rxq="2", n_txq="5", numa_id="0", port_no="1", rx-steering=rss, rx_csum_offload="true", tx_geneve_tso_offload="false", tx_ip_csum_offload="true", tx_out_ip_csum_offload="false", tx_out_udp_csum_offload="false", tx_sctp_csum_offload="true", tx_tcp_csum_offload="true", tx_tcp_seg_offload="false", tx_udp_csum_offload="true", tx_vxlan_tso_offload="false"

error: 2024-03-04T11:04:52.740Z|00384|netdev_dpdk|WARN|tun_port_p1: Output batch contains invalid packets. Only 0/1 are valid: Operation not supported 2024-03-04T11:04:54.449Z|00385|netdev_dpdk|WARN|tun_port_p1: Output batch contains invalid packets. Only 0/1 are valid: Operation not supported 2024-03-04T11:04:55.492Z|00386|netdev_dpdk|WARN|tun_port_p1: Output batch contains invalid packets. Only 0/1 are valid: Operation not supported 2024-03-04T11:04:55.592Z|00387|netdev_dpdk|WARN|tun_port_p1: Output batch contains invalid packets. Only 0/1 are valid: Operation not supported 2024-03-04T11:04:56.644Z|00388|netdev_dpdk|WARN|tun_port_p1: Output batch contains invalid packets. Only 0/1 are valid: Operation not supported

mellanox: {bus_info="bus_name=pci, vendor_id=15b3, device_id=1017", driver_name=mlx5_pci, if_descr="DPDK 22.11.1 mlx5_pci", if_type="6", link_speed="25Gbps", max_hash_mac_addrs="0", max_mac_addrs="128", max_rx_pktlen="1618", max_rx_queues="1024", max_tx_queues="1024", max_vfs="0", max_vmdq_pools="0", min_rx_bufsize="32", n_rxq="2", n_txq="5", numa_id="3", port_no="1", rx-steering=rss, rx_csum_offload="true", tx_geneve_tso_offload="false", tx_ip_csum_offload="true", tx_out_ip_csum_offload="true", tx_out_udp_csum_offload="false", tx_sctp_csum_offload="false", tx_tcp_csum_offload="true", tx_tcp_seg_offload="false", tx_udp_csum_offload="true", tx_vxlan_tso_offload="false"}

igsilya commented 4 months ago

Hi , @wangjun0728 .

if_descr="DPDK 22.11.1 net_ice"

Please, try the newer 22.11. There are numerous fixes in drivers between 22.11.1 and 22.11.4.

wangjun0728 commented 4 months ago

Hi @igsilya ,I attempted to update DPDK to version 22.11.4, but the same error persists.

E810{bus_info="bus_name=pci, vendor_id=8086, device_id=159b", driver_name=net_ice, if_descr="DPDK 22.11.4 net_ice", if_type="6", link_speed="25Gbps", max_hash_mac_addrs="0", max_mac_addrs="64", max_rx_pktlen="1618", max_rx_queues="256", max_tx_queues="256", max_vfs="0", max_vmdq_pools="0", min_rx_bufsize="1024", n_rxq="2", n_txq="5", numa_id="1", port_no="1", rx-steering=rss, rx_csum_offload="true", tx_geneve_tso_offload="false", tx_ip_csum_offload="true", tx_out_ip_csum_offload="true", tx_out_udp_csum_offload="true", tx_sctp_csum_offload="true", tx_tcp_csum_offload="true", tx_tcp_seg_offload="false", tx_udp_csum_offload="true", tx_vxlan_tso_offload="false"} error: 2024-03-05T02:12:53.092Z|00050|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event 2024-03-05T02:12:55.112Z|00051|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event 2024-03-05T02:13:08.027Z|00052|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event 2024-03-05T02:13:14.458Z|00478|connmgr|INFO|br-int<->unix#3: 5 flow_mods 18 s ago (3 adds, 2 deletes) 2024-03-05T02:13:38.871Z|00053|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event 2024-03-05T02:14:39.946Z|00054|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event 2024-03-05T02:15:05.262Z|00055|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event

82599: {bus_info="bus_name=pci, vendor_id=8086, device_id=10fb", driver_name=net_ixgbe, if_descr="DPDK 22.11.4 net_ixgbe", if_type="6", link_speed="10Gbps", max_hash_mac_addrs="4096", max_mac_addrs="127", max_rx_pktlen="1618", max_rx_queues="128", max_tx_queues="64", max_vfs="0", max_vmdq_pools="64", min_rx_bufsize="1024", n_rxq="2", n_txq="5", numa_id="0", port_no="1", rx-steering=rss, rx_csum_offload="true", tx_geneve_tso_offload="false", tx_ip_csum_offload="true", tx_out_ip_csum_offload="false", tx_out_udp_csum_offload="false", tx_sctp_csum_offload="true", tx_tcp_csum_offload="true", tx_tcp_seg_offload="false", tx_udp_csum_offload="true", tx_vxlan_tso_offload="false"} error: 2024-03-05T02:16:29.189Z|00414|netdev_dpdk|WARN|Dropped 1 log messages in last 29 seconds (most recently, 29 seconds ago) due to excessive rate 2024-03-05T02:16:29.189Z|00415|netdev_dpdk|WARN|tun_port_p1: Output batch contains invalid packets. Only 0/2 are valid: Operation not supported 2024-03-05T02:17:00.568Z|00023|netdev_dpdk(pmd-c02/id:87)|WARN|tun_port_p1: Output batch contains invalid packets. Only 0/1 are valid: Operation not supported 2024-03-05T02:17:00.568Z|00024|netdev_dpdk(pmd-c02/id:87)|WARN|tun_port_p1: Output batch contains invalid packets. Only 0/1 are valid: Operation not supported 2024-03-05T02:17:05.573Z|00025|netdev_dpdk(pmd-c02/id:87)|WARN|tun_port_p1: Output batch contains invalid packets. Only 0/1 are valid: Operation not supported 2024-03-05T02:17:05.573Z|00026|netdev_dpdk(pmd-c02/id:87)|WARN|tun_port_p1: Output batch contains invalid packets. Only 0/1 are valid: Operation not supported 2024-03-05T02:17:10.578Z|00027|netdev_dpdk(pmd-c02/id:87)|WARN|tun_port_p1: Output batch contains invalid packets. Only 0/1 are valid: Operation not supported 2024-03-05T02:17:20.589Z|00028|netdev_dpdk(pmd-c02/id:87)|WARN|Dropped 3 log messages in last 10 seconds (most recently, 5 seconds ago) due to excessive rate 2024-03-05T02:17:20.589Z|00029|netdev_dpdk(pmd-c02/id:87)|WARN|tun_port_p1: Output batch contains invalid packets. Only 0/1 are valid: Operation not supported 2024-03-05T02:17:35.604Z|00030|netdev_dpdk(pmd-c02/id:87)|WARN|Dropped 5 log messages in last 15 seconds (most recently, 5 seconds ago) due to excessive rate 2024-03-05T02:17:35.604Z|00031|netdev_dpdk(pmd-c02/id:87)|WARN|tun_port_p1: Output batch contains invalid packets. Only 0/1 are valid: Operation not supported 2024-03-05T02:17:44.560Z|00416|netdev_dpdk|WARN|Dropped 4 log messages in last 9 seconds (most recently, 3 seconds ago) due to excessive rate 2024-03-05T02:17:44.560Z|00417|netdev_dpdk|WARN|tun_port_p1: Output batch contains invalid packets. Only 0/1 are valid: Operation not supported

igsilya commented 4 months ago

OK. I don't really know what could be wrong with an ice driver and I don't have any hardware to test with. The only suggestion here will be to try and update the firmware on the card in case you're not using the latest version.

For the other driver we can try to debug that, but we need to know how these invalid packets look like. I prepared a small patch that would dump the invalid packets to the OVS log here: https://github.com/igsilya/ovs/commit/3c34e86483941b39b64f831818d05cefd618c8a8 Could you try it in your setup? You'll need to enable debug logging for netdev_dpdk module in order to see the dump.
The output should look something like this:

2024-03-05T14:18:48.161Z|00012|netdev_dpdk(pmd-c03/id:8)|DBG|ovs-p1: Invalid packet:
dump mbuf at 0x1180bce140, iova=0x2cb7ce400, buf_len=2176
  pkt_len=90, ol_flags=0x2, nb_segs=1, port=65535, ptype=0
  segment at 0x1180bce140, data=0x1180bce580, len=90, off=384, refcnt=1
  Dump data at [0x1180bce580], len=64
00000000: 33 33 00 00 00 16 AA 27 91 F9 4D 96 86 DD 60 00 | 33.....'..M...`.
00000010: 00 00 00 24 00 01 00 00 00 00 00 00 00 00 00 00 | ...$............
00000020: 00 00 00 00 00 00 FF 02 00 00 00 00 00 00 00 00 | ................
00000030: 00 00 00 00 00 16 3A 00 05 02 00 00 01 00 8F 00 | ......:.........

Also, what OVS version are you using? Maybe worth trying to update to the latest stable releases if you're not using them already.

wangjun0728 commented 4 months ago

Hi, @igsilya thank you very much for your reply. The output log after I tried using your patch is as follows: 2024-03-05T15:42:58.817Z|00012|netdev_dpdk(pmd-c02/id:87)|DBG|tun_port_p1: Invalid packet: dump mbuf at 0x192764ec0, iova=0x192765180, buf_len=2176 pkt_len=144, ol_flags=0x800800000000182, nb_segs=1, port=65535, ptype=0 segment at 0x192764ec0, data=0x1927651c2, len=144, off=66, refcnt=1 Dump data at [0x1927651c2], len=64 00000000: 40 A6 B7 21 92 8C 68 91 D0 65 C6 C3 81 00 00 5C | @..!..h..e.....\ 00000010: 08 00 45 00 00 7E 00 00 40 00 40 11 D8 07 0A FD | ..E..~..@.@..... 00000020: 26 38 0A FD 26 36 C6 E1 17 C1 00 6A AD 0D 02 40 | &8..&6.....j...@ 00000030: 65 58 00 00 32 00 01 02 80 01 00 02 00 03 0E 9C | eX..2........... 2024-03-05T15:42:58.817Z|00013|netdev_dpdk(pmd-c02/id:87)|DBG|tun_port_p1: Invalid packet: dump mbuf at 0x192764ec0, iova=0x192765180, buf_len=2176 pkt_len=144, ol_flags=0x800800000000182, nb_segs=1, port=65535, ptype=0 segment at 0x192764ec0, data=0x1927651c2, len=144, off=66, refcnt=1 Dump data at [0x1927651c2], len=64 00000000: 40 A6 B7 21 92 8C 68 91 D0 65 C6 C3 81 00 00 5C | @..!..h..e.....\ 00000010: 08 00 45 00 00 7E 00 00 40 00 40 11 D8 07 0A FD | ..E..~..@.@..... 00000020: 26 38 0A FD 26 36 C6 E1 17 C1 00 6A AD 0D 02 40 | &8..&6.....j...@ 00000030: 65 58 00 00 32 00 01 02 80 01 00 02 00 03 0E 9C | eX..2........... 2024-03-05T15:43:03.823Z|00014|netdev_dpdk(pmd-c02/id:87)|DBG|tun_port_p1: Invalid packet: dump mbuf at 0x192775d00, iova=0x192775fc0, buf_len=2176 pkt_len=144, ol_flags=0x800800000000182, nb_segs=1, port=65535, ptype=0 segment at 0x192775d00, data=0x192776002, len=144, off=66, refcnt=1 Dump data at [0x192776002], len=64 00000000: 6C FE 54 2F 0D C0 68 91 D0 65 C6 C3 81 00 00 5C | l.T/..h..e.....\ 00000010: 08 00 45 00 00 7E 00 00 40 00 40 11 D8 04 0A FD | ..E..~..@.@..... 00000020: 26 38 0A FD 26 39 8A 56 17 C1 00 6A D6 C0 02 40 | &8..&9.V...j...@ 00000030: 65 58 00 00 32 00 01 02 80 01 00 02 00 04 0A 35 | eX..2..........5 2024-03-05T15:43:03.823Z|00015|netdev_dpdk(pmd-c02/id:87)|DBG|tun_port_p1: Invalid packet: dump mbuf at 0x192775d00, iova=0x192775fc0, buf_len=2176 pkt_len=144, ol_flags=0x800800000000182, nb_segs=1, port=65535, ptype=0 segment at 0x192775d00, data=0x192776002, len=144, off=66, refcnt=1 Dump data at [0x192776002], len=64 00000000: 6C FE 54 2F 0D C0 68 91 D0 65 C6 C3 81 00 00 5C | l.T/..h..e.....\ 00000010: 08 00 45 00 00 7E 00 00 40 00 40 11 D8 04 0A FD | ..E..~..@.@..... 00000020: 26 38 0A FD 26 39 8A 56 17 C1 00 6A D6 C0 02 40 | &8..&9.V...j...@ 00000030: 65 58 00 00 32 00 01 02 80 01 00 02 00 04 0A 35 | eX..2..........5 2024-03-05T15:43:08.828Z|00016|netdev_dpdk(pmd-c02/id:87)|WARN|Dropped 3 log messages in last 10 seconds (most recently, 5 seconds ago) due to excessive rate 2024-03-05T15:43:08.828Z|00017|netdev_dpdk(pmd-c02/id:87)|WARN|tun_port_p1: Output batch contains invalid packets. Only 0/1 are valid: Operation not supported 2024-03-05T15:43:08.828Z|00018|netdev_dpdk(pmd-c02/id:87)|DBG|tun_port_p1: Invalid packet: dump mbuf at 0x192781900, iova=0x192781bc0, buf_len=2176 pkt_len=144, ol_flags=0x800800000000182, nb_segs=1, port=65535, ptype=0 segment at 0x192781900, data=0x192781c02, len=144, off=66, refcnt=1 Dump data at [0x192781c02], len=64 00000000: 40 A6 B7 21 92 8C 68 91 D0 65 C6 C3 81 00 00 5C | @..!..h..e.....\ 00000010: 08 00 45 00 00 7E 00 00 40 00 40 11 D8 07 0A FD | ..E..~..@.@..... 00000020: 26 38 0A FD 26 36 C6 E1 17 C1 00 6A AD 0D 02 40 | &8..&6.....j...@ 00000030: 65 58 00 00 32 00 01 02 80 01 00 02 00 03 0E 9C | eX..2...........

Additionally, the OVS version I'm using is 2.17.5lts. However, now when I debug after merging the changes related to checksum and TSO, I encounter this issue. It was fine before the merge, and the main changes merged are as follows. However, it's not easy for me to fully upgrade OVS because I rely on the version of OVN. https://patchwork.ozlabs.org/project/openvswitch/list/?series=&submitter=82705&state=3&q=&archive=both&delegate=

igsilya commented 4 months ago

However, it's not easy for me to fully upgrade OVS because I rely on the version of OVN.

This should not be a problem. You should be able to upgrade OVS and OVN should still work just fine. The version of OVS you build OVN with and the one that you're using in runtime don't need to be the same. There is a build time dependency, because OVN is using some of the OVS libraries, but there is no runtime dependency because communication between OVS and OVN is happening over OpenFlow or OVSDB, which are stable protocols. Any version of OVN should be able to work with any version of OVS in runtime.

So, you can build OVN with the version of OVS shipped in a submodule and use a separate newer version of OVS deployed on a host. Assuming you're using static linking, there should be no issues. In fact, that is a recommended way of using OVS with OVN.

The checksum offloading patches had a lot of small issues, so I would not be surprised if some of the fixes got lost in backporting. I'll try to look at the dumps, but I'd still recommend you to just upgrade OVS on the node instead.

igsilya commented 4 months ago

ol_flags=0x800800000000182

So, these are Geneve packets and the offload is requested for the outer IPv4 checksum.

Tunnel offloads were introduced in OVS 3.3, meaning they were not tested with DPDK older than 23.11. I would not be surprised that divers are missing some support or fixes. I don't think it makes sense to investigate this issue any further and I highly recommend you to just upgrade OVS and use it with supported version of DPDK.

wangjun0728 commented 4 months ago

Hi @igsilya ,I do understand the usage scenario of geneve messages. Currently, the 82599 network card does not support offload the outer IP checksum and outer UDP checksum. Thank you very much for your suggestion. I will try the latest version of OVS 3.3 as soon as possible and provide a verification reply as soon as possible. Thank you again for your reply.

wangjun0728 commented 4 months ago

Hi @igsilya ,I have completed the upgrade from OVS version to 3.3 and DPDK version to 23.11, but the same issue still exists。

E810:

`2024-03-07T07:42:56.712Z|00341|dpdk|INFO|VHOST_CONFIG: (/var/run/openvswitch/vh-userclient-8d1fca5d-dc-vhostuser.sock) read message VHOST_USER_SET_VRING_ENABLE 2024-03-07T07:42:56.712Z|00342|dpdk|INFO|VHOST_CONFIG: (/var/run/openvswitch/vh-userclient-8d1fca5d-dc-vhostuser.sock) set queue enable: 1 to qp idx: 6 2024-03-07T07:42:56.712Z|00343|dpdk|INFO|VHOST_CONFIG: (/var/run/openvswitch/vh-userclient-8d1fca5d-dc-vhostuser.sock) read message VHOST_USER_SET_VRING_ENABLE 2024-03-07T07:42:56.712Z|00344|dpdk|INFO|VHOST_CONFIG: (/var/run/openvswitch/vh-userclient-8d1fca5d-dc-vhostuser.sock) set queue enable: 1 to qp idx: 7 2024-03-07T07:42:56.722Z|00017|netdev_dpdk(ovs_vhost2)|INFO|State of queue 0 ( tx_qid 0 ) of vhost device '/var/run/openvswitch/vh-userclient-8d1fca5d-dc-vhostuser.sock' changed to 'enabled' 2024-03-07T07:42:56.722Z|00018|netdev_dpdk(ovs_vhost2)|INFO|State of queue 0 ( tx_qid 0 ) of vhost device '/var/run/openvswitch/vh-userclient-8d1fca5d-dc-vhostuser.sock' changed to 'disabled' 2024-03-07T07:42:56.722Z|00019|netdev_dpdk(ovs_vhost2)|INFO|State of queue 0 ( tx_qid 0 ) of vhost device '/var/run/openvswitch/vh-userclient-8d1fca5d-dc-vhostuser.sock' changed to 'enabled' 2024-03-07T07:42:56.722Z|00020|netdev_dpdk(ovs_vhost2)|INFO|State of queue 1 ( rx_qid 0 ) of vhost device '/var/run/openvswitch/vh-userclient-8d1fca5d-dc-vhostuser.sock' changed to 'enabled' 2024-03-07T07:42:56.722Z|00021|netdev_dpdk(ovs_vhost2)|INFO|State of queue 1 ( rx_qid 0 ) of vhost device '/var/run/openvswitch/vh-userclient-8d1fca5d-dc-vhostuser.sock' changed to 'disabled' 2024-03-07T07:42:56.722Z|00022|netdev_dpdk(ovs_vhost2)|INFO|State of queue 1 ( rx_qid 0 ) of vhost device '/var/run/openvswitch/vh-userclient-8d1fca5d-dc-vhostuser.sock' changed to 'enabled' 2024-03-07T07:42:56.722Z|00023|netdev_dpdk(ovs_vhost2)|INFO|State of queue 2 ( tx_qid 1 ) of vhost device '/var/run/openvswitch/vh-userclient-8d1fca5d-dc-vhostuser.sock' changed to 'enabled' 2024-03-07T07:42:56.722Z|00024|netdev_dpdk(ovs_vhost2)|INFO|State of queue 3 ( rx_qid 1 ) of vhost device '/var/run/openvswitch/vh-userclient-8d1fca5d-dc-vhostuser.sock' changed to 'enabled' 2024-03-07T07:42:56.722Z|00025|netdev_dpdk(ovs_vhost2)|INFO|State of queue 4 ( tx_qid 2 ) of vhost device '/var/run/openvswitch/vh-userclient-8d1fca5d-dc-vhostuser.sock' changed to 'enabled' 2024-03-07T07:42:56.722Z|00026|netdev_dpdk(ovs_vhost2)|INFO|State of queue 5 ( rx_qid 2 ) of vhost device '/var/run/openvswitch/vh-userclient-8d1fca5d-dc-vhostuser.sock' changed to 'enabled' 2024-03-07T07:42:56.722Z|00027|netdev_dpdk(ovs_vhost2)|INFO|State of queue 6 ( tx_qid 3 ) of vhost device '/var/run/openvswitch/vh-userclient-8d1fca5d-dc-vhostuser.sock' changed to 'enabled' 2024-03-07T07:42:56.722Z|00028|netdev_dpdk(ovs_vhost2)|INFO|State of queue 7 ( rx_qid 3 ) of vhost device '/var/run/openvswitch/vh-userclient-8d1fca5d-dc-vhostuser.sock' changed to 'enabled' 2024-03-07T07:42:59.383Z|00016|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event 2024-03-07T07:43:00.800Z|00017|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event 2024-03-07T07:43:00.803Z|00018|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event 2024-03-07T07:43:00.810Z|00019|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event 2024-03-07T07:43:00.970Z|00020|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event 2024-03-07T07:43:01.255Z|00021|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event 2024-03-07T07:43:01.426Z|00022|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event 2024-03-07T07:43:01.682Z|00023|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event 2024-03-07T07:43:02.810Z|00024|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event 2024-03-07T07:43:03.272Z|00025|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event 2024-03-07T07:43:04.676Z|00026|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event 2024-03-07T07:43:04.810Z|00027|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event 2024-03-07T07:43:05.291Z|00028|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event 2024-03-07T07:43:07.325Z|00029|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event 2024-03-07T07:43:09.348Z|00030|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event 2024-03-07T07:43:11.351Z|00031|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event 2024-03-07T07:43:12.414Z|00032|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event 2024-03-07T07:43:13.361Z|00033|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event 2024-03-07T07:43:15.371Z|00034|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event 2024-03-07T07:43:27.544Z|00035|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event 2024-03-07T07:43:36.076Z|00504|connmgr|INFO|br-int<->unix#2: 5 flow_mods 32 s ago (2 adds, 3 deletes) 2024-03-07T07:43:57.440Z|00036|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event

ovs-vsctl list open _uuid : 85f32857-8cfb-4f91-9ffe-e28acb930545 bridges : [442c3a80-1b82-4670-aea5-e03d9d4b8b73, ffc69315-36f9-4dd3-b5f5-1dd2118aca21] cur_cfg : 62 datapath_types : [netdev, system] datapaths : {netdev=c2425cab-fc67-47fc-96cc-17cd7675ca91, system=45cef88b-7a8d-4f23-852a-f12131577982} db_version : "8.5.0" dpdk_initialized : true dpdk_version : "DPDK 23.11.0" external_ids : {hostname=xc03-compute2, ovn-bridge-datapath-type=netdev, ovn-encap-ip="10.253.38.55", ovn-encap-type=geneve, ovn-remote="tcp:[10.253.38.10]:6642,tcp:[10.253.38.9]:6642,tcp:[10.253.38.5]:6642", rundir="/var/run/openvswitch", system-id=xc03-compute2} iface_types : [afxdp, afxdp-nonpmd, bareudp, dpdk, dpdkvhostuser, dpdkvhostuserclient, erspan, geneve, gre, gtpu, internal, ip6erspan, ip6gre, lisp, patch, srv6, stt, system, tap, vxlan] manager_options : [] next_cfg : 62 other_config : {bundle-idle-timeout="3600", dpdk-extra=" -a 0000:af:00.1 -a 0000:af:00.0", dpdk-init="true", dpdk-socket-mem="2048", n-handler-threads="1", pmd-cpu-mask="0xf", vlan-limit="0"} ovs_version : "3.3.1" ssl : [] statistics : {} system_type : cclinux system_version : "22.09.2"

ovs-vsctl get interface tun_port_p0 status {bus_info="bus_name=pci, vendor_id=8086, device_id=159b", driver_name=net_ice, if_descr="DPDK 23.11.0 net_ice", if_type="6", link_speed="25Gbps", max_hash_mac_addrs="0", max_mac_addrs="64", max_rx_pktlen="1618", max_rx_queues="256", max_tx_queues="256", max_vfs="0", max_vmdq_pools="0", min_rx_bufsize="1024", n_rxq="2", n_txq="5", numa_id="1", port_no="1", rx-steering=rss, rx_csum_offload="true", tx_geneve_tso_offload="false", tx_ip_csum_offload="true", tx_out_ip_csum_offload="true", tx_out_udp_csum_offload="true", tx_sctp_csum_offload="true", tx_tcp_csum_offload="true", tx_tcp_seg_offload="false", tx_udp_csum_offload="true", tx_vxlan_tso_offload="false"} ovs-vsctl get interface vh-userclient-8d1fca5d-dc status {features="0x000000017060a783", mode=client, n_rxq="4", n_txq="4", num_of_vrings="8", numa="0", socket="/var/run/openvswitch/vh-userclient-8d1fca5d-dc-vhostuser.sock", status=connected, tx_geneve_tso_offload="false", tx_ip_csum_offload="true", tx_out_ip_csum_offload="false", tx_out_udp_csum_offload="false", tx_sctp_csum_offload="true", tx_tcp_csum_offload="true", tx_tcp_seg_offload="false", tx_udp_csum_offload="true", tx_vxlan_tso_offload="false", vring_0_size="1024", vring_1_size="1024", vring_2_size="1024", vring_3_size="1024", vring_4_size="1024", vring_5_size="1024", vring_6_size="1024", vring_7_size="1024"}`

82599:

`2024-03-07T07:46:37.430Z|00002|netdev_dpdk(pmd-c02/id:88)|WARN|tun_port_p1: Output batch contains invalid packets. Only 0/1 are valid: Operation not supported 2024-03-07T07:46:45.037Z|00002|netdev_dpdk(pmd-c03/id:86)|WARN|Dropped 21 log messages in last 8 seconds (most recently, 2 seconds ago) due to excessive rate 2024-03-07T07:46:45.037Z|00003|netdev_dpdk(pmd-c03/id:86)|WARN|tun_port_p1: Output batch contains invalid packets. Only 0/1 are valid: Operation not supported 2024-03-07T07:46:57.483Z|00002|netdev_dpdk(pmd-c00/id:89)|WARN|Dropped 9 log messages in last 12 seconds (most recently, 5 seconds ago) due to excessive rate 2024-03-07T07:46:57.483Z|00003|netdev_dpdk(pmd-c00/id:89)|WARN|tun_port_p1: Output batch contains invalid packets. Only 0/1 are valid: Operation not supported

ovs-vsctl list open _uuid : 79b87ec7-4b02-4a77-a2c1-3943a68e8f79 bridges : [ab028efc-5f0a-48d4-a7aa-515681ba1c46, c2ecaf85-9a1b-4f9d-9a51-7e136737e3f7] cur_cfg : 55 datapath_types : [netdev, system] datapaths : {netdev=0e62f217-661e-46e3-906d-74a2eef05a3e, system=2a42c035-41fd-4727-b487-ee290a7f7f7c} db_version : "8.5.0" dpdk_initialized : true dpdk_version : "DPDK 23.11.0" external_ids : {hostname=xc03-compute3, ovn-bridge-datapath-type=netdev, ovn-encap-ip="10.253.38.56", ovn-encap-type=geneve, ovn-remote="tcp:[10.253.38.9]:6642,tcp:[10.253.38.5]:6642,tcp:[10.253.38.10]:6642", rundir="/var/run/openvswitch", system-id=xc03-compute3} iface_types : [afxdp, afxdp-nonpmd, bareudp, dpdk, dpdkvhostuser, dpdkvhostuserclient, erspan, geneve, gre, gtpu, internal, ip6erspan, ip6gre, lisp, patch, srv6, stt, system, tap, vxlan] manager_options : [] next_cfg : 55 other_config : {bundle-idle-timeout="3600", dpdk-extra=" -a 0000:18:00.1 -a 0000:18:00.0", dpdk-init="true", dpdk-socket-mem="2048", n-handler-threads="1", pmd-cpu-mask="0xf", vlan-limit="0"} ovs_version : "3.3.1" ssl : [] statistics : {} system_type : cclinux system_version : "22.09.2"

ovs-vsctl get interface tun_port_p0 status {bus_info="bus_name=pci, vendor_id=8086, device_id=10fb", driver_name=net_ixgbe, if_descr="DPDK 23.11.0 net_ixgbe", if_type="6", link_speed="10Gbps", max_hash_mac_addrs="4096", max_mac_addrs="127", max_rx_pktlen="1618", max_rx_queues="128", max_tx_queues="64", max_vfs="0", max_vmdq_pools="64", min_rx_bufsize="1024", n_rxq="2", n_txq="5", numa_id="0", port_no="1", rx-steering=rss, rx_csum_offload="true", tx_geneve_tso_offload="false", tx_ip_csum_offload="true", tx_out_ip_csum_offload="false", tx_out_udp_csum_offload="false", tx_sctp_csum_offload="true", tx_tcp_csum_offload="true", tx_tcp_seg_offload="false", tx_udp_csum_offload="true", tx_vxlan_tso_offload="false"}`

wangjun0728 commented 4 months ago

Regarding E810, it was observed that there was an abnormal printing message after I created the vhost user client port. I suspect that the 82599 network card does not support tx_out_udp_csum_offload and tx_out_ip_csum_offload, which is causing the issue。

igsilya commented 4 months ago

@wangjun0728 thanks for the info! This looks very similar to what is supposed to be fixed in https://patchwork.ozlabs.org/project/openvswitch/patch/20240226133837.533820-1-mkp@redhat.com/ . Could you confirm that you have this patch in your version of OVS?

CC: @mkp-rh

wangjun0728 commented 4 months ago

@igsilya The patch modification is included in my code. I've previously discussed this issue with Mike. This patch resolved the issue with my Mellanox network card, but in the case of Intel network cards (82599 and E810), there are anomalies with the Geneve overlay. image

Additionally, the latest code I'm using is this one:https://github.com/openvswitch/ovs/commits/branch-3.3/

The checksum offload capability of Intel network cards indeed differs from Mellanox network cards. I believe this might be the root cause of the issue, as it seems more like a problem with the DPDK-side driver.

mkp-rh commented 4 months ago

I think there's somewhat of a hint provided here:

Output batch contains invalid packets. Only 0/1 are valid: Operation not supported

There are very few places where DPDK will return ENOTSUPP. I don't have an E810 card right now, but will try to investigate the code.

igsilya commented 4 months ago

@mkp-rh note that Operation not supported is on 82599 card. E810 doesn't reject packets but throws MDD events.

mkp-rh commented 4 months ago

For the MDD issue, I see that the E810 errata page reports:

Some of the Tx Data checks performed as part of the Malicious Driver Detection (MDD) are reported as anti-spoof failures in addition to the actual failures

So it could be the MDD anti-spoofing features, or a general tx data check failure.

In the ixgbe driver, ixgbe_prep_pkts only returns ENOTSUP if the ol_flags are incorrect.

From the log above I see ol_flags=0x800800000000182, which when translates into the following tx offload flags:

RTE_MBUF_F_TX_TUNNEL_GENEVE RTE_MBUF_F_TX_OUTER_IPV4

ixgbe_rxtx.c contains the supported IXGBE_TX_OFFLOAD_MASK, which doesn't include RTE_MBUF_F_TX_TUNNEL_GENEVE. So that flag shouldn't be included when we send the frame.

igsilya commented 4 months ago
RTE_MBUF_F_TX_TUNNEL_GENEVE
RTE_MBUF_F_TX_OUTER_IPV4

ixgbe_rxtx.c contains the supported IXGBE_TX_OFFLOAD_MASK, which doesn't include RTE_MBUF_F_TX_TUNNEL_GENEVE. So that flag shouldn't be included when we send the frame.

So, if we do not request TSO or inner checksumming we must not specify RTE_MBUF_F_TX_TUNNEL_* flags. Right? IIUC, we need https://github.com/openvswitch/ovs/commit/9b7e1a75378f806fcf782e0286d529028e6d62bf but for tunnels.

igsilya commented 4 months ago

@mkp-rh Hmm, also the RTE_MBUF_F_TX_OUTER_IPV4 is not set, while it is required for RTE_MBUF_F_TX_OUTER_IP_CKSUM according to the API. And it seems https://github.com/openvswitch/ovs/commit/9b7e1a75378f806fcf782e0286d529028e6d62bf check is not really correct as it doesn't seem to cover all the outer/inner cases.

Edit: Nevermind, wrong flag. But the existing check might still be incomplete.

wangjun0728 commented 4 months ago

Hi @igsilya @mkp-rh , if you have suggestions for modifications, I have the environment for E810 and 82599 network cards to verify.

igsilya commented 4 months ago

@wangjun0728 Could you try this one: https://github.com/igsilya/ovs/commit/00c0a91f89084bf3ac333918c729fc7274f476e4 ? It should fix the 82599 case at least, I think.

wangjun0728 commented 4 months ago

@igsilya This looks great! Applying your modifications resolved the error with the 82599 network card, and I can now communicate without issues using iperf. Additionally, I've observed the E810 network card, and the MDD error still persists.

wangjun0728 commented 4 months ago

I also noticed a modification in the DPDK community, but applying it didn't yield any results. I suspect there might be a flaw in the E810 driver's support for tunnel TSO.

https://patches.dpdk.org/project/dpdk/patch/20231207023051.1914021-1-kaiwenx.deng@intel.com/

wangjun0728 commented 4 months ago

After enabling DPDK's PMD logs with the command --log-level=pmd,debug, I captured a portion of DPDK startup log information. Currently, it's unclear whether there's any definite correlation with the errors present.

2024-03-11T02:38:04.088Z|00007|dpdk|INFO|Using DPDK 23.11.0 2024-03-11T02:38:04.088Z|00008|dpdk|INFO|DPDK Enabled - initializing... 2024-03-11T02:38:04.088Z|00009|dpdk|INFO|dpdk init get port_num:2 2024-03-11T02:38:04.088Z|00010|dpdk|INFO|EAL ARGS: ovs-vswitchd -a 0000:af:00.1 -a 0000:af:00.0 --log-level=pmd,debug --socket-mem 2048 -l 0. 2024-03-11T02:38:04.091Z|00011|dpdk|INFO|EAL: Detected CPU lcores: 80 2024-03-11T02:38:04.091Z|00012|dpdk|INFO|EAL: Detected NUMA nodes: 2 2024-03-11T02:38:04.091Z|00013|dpdk|INFO|EAL: Detected static linkage of DPDK 2024-03-11T02:38:04.096Z|00014|dpdk|INFO|EAL: Multi-process socket /var/run/dpdk/rte/mp_socket 2024-03-11T02:38:04.099Z|00015|dpdk|INFO|EAL: Selected IOVA mode 'VA' 2024-03-11T02:38:04.100Z|00016|dpdk|WARN|EAL: No free 2048 kB hugepages reported on node 0 2024-03-11T02:38:04.100Z|00017|dpdk|WARN|EAL: No free 2048 kB hugepages reported on node 1 2024-03-11T02:38:04.101Z|00018|dpdk|INFO|EAL: VFIO support initialized 2024-03-11T02:38:04.839Z|00019|dpdk|INFO|EAL: Using IOMMU type 1 (Type 1) 2024-03-11T02:38:04.994Z|00020|dpdk|INFO|EAL: Ignore mapping IO port bar(1) 2024-03-11T02:38:04.994Z|00021|dpdk|INFO|EAL: Ignore mapping IO port bar(4) 2024-03-11T02:38:05.120Z|00022|dpdk|INFO|EAL: Probe PCI driver: net_ice (8086:159b) device: 0000:af:00.0 (socket 1) 2024-03-11T02:38:05.586Z|00023|dpdk|INFO|ice_load_pkg_type(): Active package is: 1.3.28.0, ICE OS Default Package (single VLAN mode) 2024-03-11T02:38:05.586Z|00024|dpdk|INFO|ice_dev_init(): FW 5.3.-1521546806 API 1.7 2024-03-11T02:38:05.608Z|00025|dpdk|INFO|ice_flow_init(): Engine 4 disabled 2024-03-11T02:38:05.608Z|00026|dpdk|INFO|ice_fdir_setup(): FDIR HW Capabilities: fd_fltr_guar = 1024, fd_fltr_best_effort = 14336. 2024-03-11T02:38:05.612Z|00027|dpdk|INFO|vsi_queues_bind_intr(): queue 0 is binding to vect 257 2024-03-11T02:38:05.612Z|00028|dpdk|INFO|ice_fdir_setup(): FDIR setup successfully, with programming queue 0. 2024-03-11T02:38:05.736Z|00029|dpdk|INFO|EAL: Ignore mapping IO port bar(1) 2024-03-11T02:38:05.736Z|00030|dpdk|INFO|EAL: Ignore mapping IO port bar(4) 2024-03-11T02:38:05.839Z|00031|dpdk|INFO|EAL: Probe PCI driver: net_ice (8086:159b) device: 0000:af:00.1 (socket 1) 2024-03-11T02:38:05.942Z|00032|dpdk|INFO|ice_load_pkg_type(): Active package is: 1.3.28.0, ICE OS Default Package (single VLAN mode) 2024-03-11T02:38:05.942Z|00033|dpdk|INFO|ice_dev_init(): FW 5.3.-1521546806 API 1.7 2024-03-11T02:38:05.965Z|00034|dpdk|INFO|ice_flow_init(): Engine 4 disabled 2024-03-11T02:38:05.965Z|00035|dpdk|INFO|ice_fdir_setup(): FDIR HW Capabilities: fd_fltr_guar = 1024, fd_fltr_best_effort = 14336. 2024-03-11T02:38:05.968Z|00036|dpdk|INFO|vsi_queues_bind_intr(): queue 0 is binding to vect 257 2024-03-11T02:38:05.968Z|00037|dpdk|INFO|ice_fdir_setup(): FDIR setup successfully, with programming queue 0. 2024-03-11T02:38:05.972Z|00038|dpdk|WARN|TELEMETRY: No legacy callbacks, legacy socket not created 2024-03-11T02:38:05.972Z|00039|dpdk|INFO|DPDK rte_pdump - initializing... 2024-03-11T02:38:05.977Z|00044|dpdk|INFO|DPDK Enabled - initialized 2024-03-11T02:38:06.223Z|00001|dpdk|INFO|ice_interrupt_handler(): OICR: link state change event 2024-03-11T02:38:06.406Z|00089|dpdk|INFO|Device with port_id=1 already stopped 2024-03-11T02:38:06.572Z|00090|dpdk|INFO|ice_set_rx_function(): Using AVX2 OFFLOAD Vector Rx (port 1). 2024-03-11T02:38:06.572Z|00091|dpdk|ERR|ice_vsi_config_outer_vlan_stripping(): Single VLAN mode (SVM) does not support qinq 2024-03-11T02:38:06.572Z|00092|dpdk|INFO|vsi_queues_bind_intr(): queue 1 is binding to vect 1 2024-03-11T02:38:06.572Z|00093|dpdk|INFO|vsi_queues_bind_intr(): queue 2 is binding to vect 1 2024-03-11T02:38:07.555Z|00002|dpdk|INFO|ice_interrupt_handler(): OICR: link state change event 2024-03-11T02:38:07.600Z|00102|dpdk|INFO|Device with port_id=0 already stopped 2024-03-11T02:38:07.623Z|00103|dpdk|INFO|ice_set_rx_function(): Using AVX2 OFFLOAD Vector Rx (port 0). 2024-03-11T02:38:07.624Z|00104|dpdk|ERR|ice_vsi_config_outer_vlan_stripping(): Single VLAN mode (SVM) does not support qinq 2024-03-11T02:38:07.624Z|00105|dpdk|INFO|vsi_queues_bind_intr(): queue 1 is binding to vect 1 2024-03-11T02:38:07.624Z|00106|dpdk|INFO|vsi_queues_bind_intr(): queue 2 is binding to vect 1

igsilya commented 4 months ago

@wangjun0728 I posted the refined verion of the 82599 fix here: https://patchwork.ozlabs.org/project/openvswitch/patch/20240311183231.37253-1-i.maximets@ovn.org/ Could you check with this version? It has some extra checking, but I do not expect it to behave much different, i.e. it should fix the 82599 case, but should not affect the E810 problem.

igsilya commented 4 months ago

For the E810, I still don't have a lot to suggest. One thing that might help understanding the situation better is to dump some of the mbufs we're trying to send. Maybe you can capture some logs with the following change applied:

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 8c52accff..331031035 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2607,6 +2607,17 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf)
                  (char *) dp_packet_eth(pkt);
         mbuf->outer_l3_len = (char *) dp_packet_l4(pkt) -
                  (char *) dp_packet_l3(pkt);
+        VLOG_WARN_RL(&rl, "%s: Tunnel offload:"
+                     " outer_l2_len=%d"
+                     " outer_l3_len=%d"
+                     " l2_len=%d"
+                     " l3_len=%d"
+                     " l4_len=%d",
+                     netdev_get_name(&dev->up),
+                     mbuf->outer_l2_len, mbuf->outer_l3_len,
+                     mbuf->l2_len, mbuf->l3_len, mbuf->l4_len);
+        netdev_dpdk_mbuf_dump(netdev_get_name(&dev->up),
+                              "Tunneled packet", mbuf);
     } else {
         mbuf->l2_len = (char *) dp_packet_l3(pkt) -
                (char *) dp_packet_eth(pkt);

? It will spam the packets into the log, so definitely not recommended for a long-running test. But maybe it can shed some light on the problem.

mkp-rh commented 4 months ago

@wangjun0728 Are you able to check if the following patch resolves your issue on E810?

diff --git a/lib/dp-packet.c b/lib/dp-packet.c
index df7bf8e6b..046acd8ba 100644
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -597,12 +597,15 @@ dp_packet_ol_send_prepare(struct dp_packet *p, uint64_t flags)
          * support inner checksum offload and an outer UDP checksum is
          * required, then we can't offload inner checksum either. As that would
          * invalidate the outer checksum. */
-        if (!(flags & NETDEV_TX_OFFLOAD_OUTER_UDP_CKSUM) &&
-                dp_packet_hwol_is_outer_udp_cksum(p)) {
-            flags &= ~(NETDEV_TX_OFFLOAD_TCP_CKSUM |
-                       NETDEV_TX_OFFLOAD_UDP_CKSUM |
-                       NETDEV_TX_OFFLOAD_SCTP_CKSUM |
-                       NETDEV_TX_OFFLOAD_IPV4_CKSUM);
+        if (!(flags & NETDEV_TX_OFFLOAD_OUTER_UDP_CKSUM)) {
+            if (dp_packet_hwol_is_outer_udp_cksum(p)) {
+                flags &= ~(NETDEV_TX_OFFLOAD_TCP_CKSUM |
+                           NETDEV_TX_OFFLOAD_UDP_CKSUM |
+                           NETDEV_TX_OFFLOAD_SCTP_CKSUM |
+                           NETDEV_TX_OFFLOAD_IPV4_CKSUM);
+            }
+            *dp_packet_ol_flags_ptr(p) &= ~(DP_PACKET_OL_TX_TUNNEL_GENEVE |
+                                            DP_PACKET_OL_TX_TUNNEL_VXLAN);
         }
     }
wangjun0728 commented 4 months ago

@wangjun0728 I posted the refined verion of the 82599 fix here: https://patchwork.ozlabs.org/project/openvswitch/patch/20240311183231.37253-1-i.maximets@ovn.org/ Could you check with this version? It has some extra checking, but I do not expect it to behave much different, i.e. it should fix the 82599 case, but should not affect the E810 problem.

Thank you very much. I've validated this patch, and it seems everything is fine with 82599. There are no log error messages either, which is great.
1710225614538

wangjun0728 commented 4 months ago

For the E810, I still don't have a lot to suggest. One thing that might help understanding the situation better is to dump some of the mbufs we're trying to send. Maybe you can capture some logs with the following change applied:

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 8c52accff..331031035 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2607,6 +2607,17 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf)
                  (char *) dp_packet_eth(pkt);
         mbuf->outer_l3_len = (char *) dp_packet_l4(pkt) -
                  (char *) dp_packet_l3(pkt);
+        VLOG_WARN_RL(&rl, "%s: Tunnel offload:"
+                     " outer_l2_len=%d"
+                     " outer_l3_len=%d"
+                     " l2_len=%d"
+                     " l3_len=%d"
+                     " l4_len=%d",
+                     netdev_get_name(&dev->up),
+                     mbuf->outer_l2_len, mbuf->outer_l3_len,
+                     mbuf->l2_len, mbuf->l3_len, mbuf->l4_len);
+        netdev_dpdk_mbuf_dump(netdev_get_name(&dev->up),
+                              "Tunneled packet", mbuf);
     } else {
         mbuf->l2_len = (char *) dp_packet_l3(pkt) -
                (char *) dp_packet_eth(pkt);

? It will spam the packets into the log, so definitely not recommended for a long-running test. But maybe it can shed some light on the problem.

I applied your modification and enabled debug logging mode for netdev_dpdk. Below are some log prints; hopefully, they will be helpful to you.

2024-03-12T06:22:57.262Z|00012|netdev_dpdk(pmd-c00/id:89)|WARN|tun_port_p0: Tunnel offload: outer_l2_len=18 outer_l3_len=20 l2_len=38 l3_len=20 l4_len=32
2024-03-12T06:22:57.262Z|00013|netdev_dpdk(pmd-c00/id:89)|DBG|tun_port_p0: Tunneled packet:
dump mbuf at 0x18f34ee40, iova=0x18f34f100, buf_len=2176
  pkt_len=128, ol_flags=0xc90820000000002, nb_segs=1, port=65535, ptype=0
  segment at 0x18f34ee40, data=0x18f34f142, len=128, off=66, refcnt=1
  Dump data at [0x18f34f142], len=128
00000000: A0 88 C2 20 00 7E B4 96 91 BC 45 7B 81 00 00 5C | ... .~....E{...\
00000010: 08 00 45 00 00 6E 00 00 40 00 40 11 00 00 0A FD | ..E..n..@.@.....
00000020: 26 37 0A FD 26 3B E7 36 17 C1 00 5A FF FF 02 40 | &7..&;.6...Z...@
00000030: 65 58 00 00 2D 00 01 02 80 01 00 08 00 04 40 FE | eX..-.........@.
00000040: 95 EF 85 2C 0A 8B BF 77 86 35 08 00 45 00 00 34 | ...,...w.5..E..4
00000050: 00 00 40 00 3F 06 59 AA 0A 00 00 0B 0A 38 CD D7 | ..@.?.Y......8..
00000060: 00 16 CB BB 0B 9A 8A AE 15 7E 36 20 80 12 FA F0 | .........~6 ....
00000070: E2 40 00 00 02 04 05 B4 01 01 04 02 01 03 03 09 | .@..............
2024-03-12T06:22:58.268Z|00014|netdev_dpdk(pmd-c00/id:89)|DBG|tun_port_p0: Tunneled packet:
dump mbuf at 0x18f34e280, iova=0x18f34e540, buf_len=2176
  pkt_len=128, ol_flags=0xc90820000000002, nb_segs=1, port=65535, ptype=0
  segment at 0x18f34e280, data=0x18f34e582, len=128, off=66, refcnt=1
  Dump data at [0x18f34e582], len=128
00000000: A0 88 C2 20 00 7E B4 96 91 BC 45 7B 81 00 00 5C | ... .~....E{...\
00000010: 08 00 45 00 00 6E 00 00 40 00 40 11 00 00 0A FD | ..E..n..@.@.....
00000020: 26 37 0A FD 26 3B E7 36 17 C1 00 5A FF FF 02 40 | &7..&;.6...Z...@
00000030: 65 58 00 00 2D 00 01 02 80 01 00 08 00 04 40 FE | eX..-.........@.
00000040: 95 EF 85 2C 0A 8B BF 77 86 35 08 00 45 00 00 34 | ...,...w.5..E..4
00000050: 00 00 40 00 3F 06 59 AA 0A 00 00 0B 0A 38 CD D7 | ..@.?.Y......8..
00000060: 00 16 CB BB 0B 9A 8A AE 15 7E 36 20 80 12 FA F0 | .........~6 ....
00000070: E2 40 00 00 02 04 05 B4 01 01 04 02 01 03 03 09 | .@..............
2024-03-12T06:22:59.320Z|00015|netdev_dpdk(pmd-c00/id:89)|DBG|tun_port_p0: Tunneled packet:
dump mbuf at 0x18f34d6c0, iova=0x18f34d980, buf_len=2176
  pkt_len=128, ol_flags=0xc90820000000002, nb_segs=1, port=65535, ptype=0
  segment at 0x18f34d6c0, data=0x18f34d9c2, len=128, off=66, refcnt=1
  Dump data at [0x18f34d9c2], len=128
00000000: A0 88 C2 20 00 7E B4 96 91 BC 45 7B 81 00 00 5C | ... .~....E{...\
00000010: 08 00 45 00 00 6E 00 00 40 00 40 11 00 00 0A FD | ..E..n..@.@.....
00000020: 26 37 0A FD 26 3B E7 36 17 C1 00 5A FF FF 02 40 | &7..&;.6...Z...@
00000030: 65 58 00 00 2D 00 01 02 80 01 00 08 00 04 40 FE | eX..-.........@.
00000040: 95 EF 85 2C 0A 8B BF 77 86 35 08 00 45 00 00 34 | ...,...w.5..E..4
00000050: 00 00 40 00 3F 06 59 AA 0A 00 00 0B 0A 38 CD D7 | ..@.?.Y......8..
00000060: 00 16 CB BB 0B 9A 8A AE 15 7E 36 20 80 12 FA F0 | .........~6 ....
00000070: E2 40 00 00 02 04 05 B4 01 01 04 02 01 03 03 09 | .@..............
2024-03-12T06:23:00.278Z|00016|netdev_dpdk(pmd-c00/id:89)|DBG|tun_port_p0: Tunneled packet:
dump mbuf at 0x18f34cb00, iova=0x18f34cdc0, buf_len=2176
  pkt_len=128, ol_flags=0xc90820000000002, nb_segs=1, port=65535, ptype=0
  segment at 0x18f34cb00, data=0x18f34ce02, len=128, off=66, refcnt=1
  Dump data at [0x18f34ce02], len=128
00000000: A0 88 C2 20 00 7E B4 96 91 BC 45 7B 81 00 00 5C | ... .~....E{...\
00000010: 08 00 45 00 00 6E 00 00 40 00 40 11 00 00 0A FD | ..E..n..@.@.....
00000020: 26 37 0A FD 26 3B E7 36 17 C1 00 5A FF FF 02 40 | &7..&;.6...Z...@
00000030: 65 58 00 00 2D 00 01 02 80 01 00 08 00 04 40 FE | eX..-.........@.
00000040: 95 EF 85 2C 0A 8B BF 77 86 35 08 00 45 00 00 34 | ...,...w.5..E..4
00000050: 00 00 40 00 3F 06 59 AA 0A 00 00 0B 0A 38 CD D7 | ..@.?.Y......8..
00000060: 00 16 CB BB 0B 9A 8A AE 15 7E 36 20 80 12 FA F0 | .........~6 ....
00000070: E2 40 00 00 02 04 05 B4 01 01 04 02 01 03 03 09 | .@..............
2024-03-12T06:23:15.211Z|00031|netdev_dpdk(pmd-c02/id:88)|WARN|Dropped 3 log messages in last 17 seconds (most recently, 15 seconds ago) due to excessive rate
2024-03-12T06:23:15.211Z|00032|netdev_dpdk(pmd-c02/id:88)|WARN|tun_port_p0: Tunnel offload: outer_l2_len=18 outer_l3_len=20 l2_len=38 l3_len=20 l4_len=8
2024-03-12T06:23:15.212Z|00033|netdev_dpdk(pmd-c02/id:88)|DBG|tun_port_p0: Tunneled packet:
dump mbuf at 0x18eeda200, iova=0x18eeda4c0, buf_len=2176
  pkt_len=152, ol_flags=0xcb0820000000002, nb_segs=1, port=65535, ptype=0
  segment at 0x18eeda200, data=0x18eeda502, len=152, off=66, refcnt=1
  Dump data at [0x18eeda502], len=152
00000000: 40 A6 B7 21 92 8C B4 96 91 BC 45 7B 81 00 00 5C | @..!......E{...\
00000010: 08 00 45 00 00 86 00 00 40 00 40 11 00 00 0A FD | ..E.....@.@.....
00000020: 26 37 0A FD 26 36 B7 8C 17 C1 00 72 FF FF 02 40 | &7..&6.....r...@
00000030: 65 58 00 00 30 00 01 02 80 01 00 02 00 04 06 75 | eX..0..........u
00000040: CA 23 3F 44 02 81 5E AC BE 89 08 00 45 00 00 4C | .#?D..^.....E..L
00000050: 98 86 00 00 3F 11 CD 00 0A 00 00 0B 0B 0B 01 05 | ....?...........
00000060: DD 51 00 7B 00 38 16 64 23 00 06 20 00 00 00 00 | .Q.{.8.d#.. ....
00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................
00000080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................
00000090: 41 6A 12 2A 8F 80 DD 7A                         | Aj.*...z
2024-03-12T06:23:36.248Z|00595|netdev_dpdk|WARN|tun_port_p0: Tunnel offload: outer_l2_len=18 outer_l3_len=20 l2_len=18 l3_len=20 l4_len=0
2024-03-12T06:23:36.248Z|00596|netdev_dpdk|DBG|tun_port_p0: Tunneled packet:
dump mbuf at 0x11d4754c40, iova=0x11d4754f00, buf_len=2176
  pkt_len=132, ol_flags=0xd00820000000002, nb_segs=1, port=65535, ptype=0
  segment at 0x11d4754c40, data=0x11d4754f80, len=132, off=128, refcnt=1
  Dump data at [0x11d4754f80], len=132
00000000: 08 C0 EB AF 0D 3F B4 96 91 BC 45 7B 81 00 00 5C | .....?....E{...\
00000010: 08 00 45 00 00 72 00 00 40 00 40 11 00 00 0A FD | ..E..r..@.@.....
00000020: 26 37 0A FD 26 32 BB 80 17 C1 00 5E FF FF 02 40 | &7..&2.....^...@
00000030: 65 58 00 00 12 00 01 02 80 01 00 0B 80 00 33 33 | eX............33
00000040: 00 00 00 02 0A 90 F1 D7 BB A1 86 DD 60 00 00 00 | ............`...
00000050: 00 10 3A FF FE 80 00 00 00 00 00 00 08 90 F1 FF | ..:.............
00000060: FE D7 BB A1 FF 02 00 00 00 00 00 00 00 00 00 00 | ................
00000070: 00 00 00 02 85 00 0F 1B 00 00 00 00 01 01 0A 90 | ................
00000080: F1 D7 BB A1                                     | ....
2024-03-12T06:23:36.248Z|00597|netdev_dpdk|WARN|tun_port_p0: Tunnel offload: outer_l2_len=18 outer_l3_len=20 l2_len=18 l3_len=20 l4_len=0
2024-03-12T06:23:36.248Z|00598|netdev_dpdk|DBG|tun_port_p0: Tunneled packet:
dump mbuf at 0x11d4755800, iova=0x11d4755ac0, buf_len=2176
  pkt_len=132, ol_flags=0xd00820000000002, nb_segs=1, port=65535, ptype=0
  segment at 0x11d4755800, data=0x11d4755b40, len=132, off=128, refcnt=1
  Dump data at [0x11d4755b40], len=132
00000000: 68 91 D0 65 C6 C3 B4 96 91 BC 45 7B 81 00 00 5C | h..e......E{...\
00000010: 08 00 45 00 00 72 00 00 40 00 40 11 00 00 0A FD | ..E..r..@.@.....
00000020: 26 37 0A FD 26 38 BB 80 17 C1 00 5E FF FF 02 40 | &7..&8.....^...@
00000030: 65 58 00 00 12 00 01 02 80 01 00 0B 80 00 33 33 | eX............33
00000040: 00 00 00 02 0A 90 F1 D7 BB A1 86 DD 60 00 00 00 | ............`...
00000050: 00 10 3A FF FE 80 00 00 00 00 00 00 08 90 F1 FF | ..:.............
00000060: FE D7 BB A1 FF 02 00 00 00 00 00 00 00 00 00 00 | ................
00000070: 00 00 00 02 85 00 0F 1B 00 00 00 00 01 01 0A 90 | ................
00000080: F1 D7 BB A1                                     | ....
2024-03-12T06:23:36.248Z|00010|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event
wangjun0728 commented 4 months ago

@wangjun0728 Are you able to check if the following patch resolves your issue on E810?

diff --git a/lib/dp-packet.c b/lib/dp-packet.c
index df7bf8e6b..046acd8ba 100644
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -597,12 +597,15 @@ dp_packet_ol_send_prepare(struct dp_packet *p, uint64_t flags)
          * support inner checksum offload and an outer UDP checksum is
          * required, then we can't offload inner checksum either. As that would
          * invalidate the outer checksum. */
-        if (!(flags & NETDEV_TX_OFFLOAD_OUTER_UDP_CKSUM) &&
-                dp_packet_hwol_is_outer_udp_cksum(p)) {
-            flags &= ~(NETDEV_TX_OFFLOAD_TCP_CKSUM |
-                       NETDEV_TX_OFFLOAD_UDP_CKSUM |
-                       NETDEV_TX_OFFLOAD_SCTP_CKSUM |
-                       NETDEV_TX_OFFLOAD_IPV4_CKSUM);
+        if (!(flags & NETDEV_TX_OFFLOAD_OUTER_UDP_CKSUM)) {
+            if (dp_packet_hwol_is_outer_udp_cksum(p)) {
+                flags &= ~(NETDEV_TX_OFFLOAD_TCP_CKSUM |
+                           NETDEV_TX_OFFLOAD_UDP_CKSUM |
+                           NETDEV_TX_OFFLOAD_SCTP_CKSUM |
+                           NETDEV_TX_OFFLOAD_IPV4_CKSUM);
+            }
+            *dp_packet_ol_flags_ptr(p) &= ~(DP_PACKET_OL_TX_TUNNEL_GENEVE |
+                                            DP_PACKET_OL_TX_TUNNEL_VXLAN);
         }
     }

Hi, after applying your modification, the error logs for E810 still persist, and there are also additional error logs stating "ip packet has invalid checksum". Moreover, I've noticed this error log in other network card environments as well.

E810: 2024-03-12T05:40:14.226Z|00021|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event 2024-03-12T05:40:14.229Z|00022|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event 2024-03-12T05:40:14.257Z|00023|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event 2024-03-12T05:40:14.395Z|00024|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event 2024-03-12T05:40:14.435Z|00025|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event 2024-03-12T05:40:14.723Z|00026|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event 2024-03-12T05:40:16.257Z|00027|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event 2024-03-12T05:40:18.238Z|00028|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event 2024-03-12T05:40:18.257Z|00029|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event 2024-03-12T05:40:25.849Z|00030|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event 2024-03-12T05:40:37.289Z|00530|connmgr|INFO|br-int<->unix#2: 5 flow_mods 56 s ago (3 adds, 2 deletes) 2024-03-12T05:40:39.194Z|00008|native_tnl(pmd-c02/id:88)|WARN|ip packet has invalid checksum 2024-03-12T05:40:40.193Z|00009|native_tnl(pmd-c02/id:88)|WARN|ip packet has invalid checksum 2024-03-12T05:40:41.120Z|00031|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event 2024-03-12T05:40:42.194Z|00010|native_tnl(pmd-c02/id:88)|WARN|ip packet has invalid checksum 2024-03-12T05:40:46.205Z|00011|native_tnl(pmd-c02/id:88)|WARN|ip packet has invalid checksum 2024-03-12T05:40:54.210Z|00012|native_tnl(pmd-c02/id:88)|WARN|ip packet has invalid checksum 2024-03-12T05:41:10.368Z|00032|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event 2024-03-12T05:42:11.397Z|00033|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event

82599: 2024-03-12T05:43:39.300Z|00030|native_tnl(pmd-c03/id:86)|WARN|ip packet has invalid checksum 2024-03-12T05:43:39.508Z|00001|native_tnl(pmd-c02/id:88)|WARN|ip packet has invalid checksum 2024-03-12T05:43:41.521Z|00002|native_tnl(pmd-c02/id:88)|WARN|ip packet has invalid checksum 2024-03-12T05:43:45.535Z|00003|native_tnl(pmd-c02/id:88)|WARN|ip packet has invalid checksum 2024-03-12T05:43:53.541Z|00004|native_tnl(pmd-c02/id:88)|WARN|ip packet has invalid checksum

igsilya commented 4 months ago

Thanks @wangjun0728 !

The ol_flags for the packet that might be contributing to MDD failures are:

0xd00820000000002

RTE_MBUF_F_TX_OUTER_UDP_CKSUM
RTE_MBUF_F_TX_TUNNEL_GENEVE
RTE_MBUF_F_TX_IPV6
RTE_MBUF_F_TX_OUTER_IP_CKSUM
RTE_MBUF_F_TX_OUTER_IPV4

It is an ICMPv6 packet encapsulated in IPv4 Geneve tunnel, so the flags seem correct at a first glance, but I wonder if the driver gets confused by the mix of IPv6 and IPv4 flags or simply by the existence of the inner IPv6 mark while inner offloads are not requested. The RTE_MBUF_F_TX_IPV6 should not be technically needed here, so we might just clear it?

Maybe something like this on top of the 82599 patch would help with E810 case:

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 8c52accff..270d3e11c 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2607,6 +2607,15 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf)
                  (char *) dp_packet_eth(pkt);
         mbuf->outer_l3_len = (char *) dp_packet_l4(pkt) -
                  (char *) dp_packet_l3(pkt);
+
+        /* If neither inner checksums nor TSO is requested, inner marks
+         * should not be set. */
+        if (!(mbuf->ol_flags & (RTE_MBUF_F_TX_IP_CKSUM |
+                                RTE_MBUF_F_TX_L4_MASK  |
+                                RTE_MBUF_F_TX_TCP_SEG))) {
+            mbuf->ol_flags &= ~(RTE_MBUF_F_TX_IPV4 |
+                                RTE_MBUF_F_TX_IPV6);
+        }
     } else {
         mbuf->l2_len = (char *) dp_packet_l3(pkt) -
                (char *) dp_packet_eth(pkt);

Could you try?

=============================================================

Another packet is a TCP packet in Geneve tunnel, it has:

0xc90820000000002

RTE_MBUF_F_TX_TCP_CKSUM
RTE_MBUF_F_TX_IPV4

RTE_MBUF_F_TX_OUTER_IP_CKSUM
RTE_MBUF_F_TX_OUTER_IPV4
RTE_MBUF_F_TX_OUTER_UDP_CKSUM
RTE_MBUF_F_TX_TUNNEL_GENEVE

This seems correct, it will also gain RTE_MBUF_F_TX_IP_CKSUM in the end of processing, so should be fine. I don't see anything that can be wrong with this one.

And one more packet is a UDP (NTP) inside of the Geneve tunnel:

0xcb0820000000002

RTE_MBUF_F_TX_UDP_CKSUM
RTE_MBUF_F_TX_IPV4

RTE_MBUF_F_TX_OUTER_IP_CKSUM
RTE_MBUF_F_TX_OUTER_IPV4
RTE_MBUF_F_TX_OUTER_UDP_CKSUM
RTE_MBUF_F_TX_TUNNEL_GENEVE

This one also seems fine, however the mbuf->ol_flags & RTE_MBUF_F_TX_TCP_CKSUM is an incorrect check in the netdev_dpdk_prep_hwol_packet() function, because L4 checksum bits are not just bits, they are bit fields. RTE_MBUF_F_TX_UDP_CKSUM is a two-bit field. It happens to match a single bit in TCP checkum filed, so it should gain the RTE_MBUF_F_TX_IP_CKSUM correctly. However, it will also get tso_segsz iniitalized sith some data and some other UDP packets may get some garbase set in l4_len. So, the correct check should be something like this:

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 8c52accff..4e516c3f8 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2625,7 +2634,7 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf)
         }
     }

-    if (mbuf->ol_flags & RTE_MBUF_F_TX_TCP_CKSUM) {
+    if ((mbuf->ol_flags & RTE_MBUF_F_TX_L4_MASK) == RTE_MBUF_F_TX_TCP_CKSUM) {
         if (!th) {
             VLOG_WARN_RL(&rl, "%s: TCP offloading without L4 header"
                          " pkt len: %"PRIu32"", dev->up.name, mbuf->pkt_len);
@@ -2652,11 +2661,14 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf)
                 return false;
             }
         }
+    }

-        if (mbuf->ol_flags & RTE_MBUF_F_TX_IPV4) {
-            mbuf->ol_flags |= RTE_MBUF_F_TX_IP_CKSUM;
-        }
+    /* If L4 checksum offload is requested, IPv4 should be requested as well. */
+    if (mbuf->ol_flags & RTE_MBUF_F_TX_L4_MASK
+        && mbuf->ol_flags & RTE_MBUF_F_TX_IPV4) {
+        mbuf->ol_flags |= RTE_MBUF_F_TX_IP_CKSUM;
     }
+
     return true;
 }

Maybe worth trying this as well.

igsilya commented 4 months ago

Worth noting that these packets also have a vlan header in the set of outer headers, but this should not cause any issues as offsets seem to be correct.

Another thing that may be an issue or may be not is that l2_len is technically incorrect for packets that do not request inner checksum offload. For example outer_l2_len=18 outer_l3_len=20 l2_len=18 l3_len=20 l4_len=0. Here the l2_len doesn't include the outer L4 length, while it should since the packet is a tunnel packet. In fact, they l2_len and l3_len look like a direct copy of the outer lengths and not actual lengths of the inner packet. Since we do not request any offloading on the inner header, having incorrect l2_len might be fine, but it may as well not be if the driver sets up something weird in the hardware because of this.

wangjun0728 commented 4 months ago

@igsilya Thanks for your reply. I modified it according to your suggestion and added dump printing, but there is still a problem.

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 2ec0f6c6e1459fe3dc0614140c37fdcbdbb228ff..375eb78119c43433aa499a79fe3ff30251d48d13 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2617,6 +2617,26 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf)
                  (char *) dp_packet_eth(pkt);
         mbuf->outer_l3_len = (char *) dp_packet_l4(pkt) -
                  (char *) dp_packet_l3(pkt);
+
+        /* If neither inner checksums nor TSO is requested, inner marks
+         * should not be set. */
+        if (!(mbuf->ol_flags & (RTE_MBUF_F_TX_IP_CKSUM |
+                                RTE_MBUF_F_TX_L4_MASK  |
+                                RTE_MBUF_F_TX_TCP_SEG))) {
+            mbuf->ol_flags &= ~(RTE_MBUF_F_TX_IPV4 |
+                                RTE_MBUF_F_TX_IPV6);
+        }
+        VLOG_WARN_RL(&rl, "%s: Tunnel offload:"
+                     " outer_l2_len=%d"
+                     " outer_l3_len=%d"
+                     " l2_len=%d"
+                     " l3_len=%d"
+                     " l4_len=%d",
+                     netdev_get_name(&dev->up),
+                     mbuf->outer_l2_len, mbuf->outer_l3_len,
+                     mbuf->l2_len, mbuf->l3_len, mbuf->l4_len);
+        netdev_dpdk_mbuf_dump(netdev_get_name(&dev->up),
+                              "Tunneled packet", mbuf);
     } else {
         mbuf->l2_len = (char *) dp_packet_l3(pkt) -
                (char *) dp_packet_eth(pkt);
@@ -2635,7 +2655,7 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf)
         }
     }

-    if (mbuf->ol_flags & RTE_MBUF_F_TX_TCP_CKSUM) {
+    if ((mbuf->ol_flags & RTE_MBUF_F_TX_L4_MASK) == RTE_MBUF_F_TX_TCP_CKSUM) {
         if (!th) {
             VLOG_WARN_RL(&rl, "%s: TCP offloading without L4 header"
                          " pkt len: %"PRIu32"", dev->up.name, mbuf->pkt_len);
@@ -2662,11 +2682,14 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf)
                 return false;
             }
         }
+    }

-        if (mbuf->ol_flags & RTE_MBUF_F_TX_IPV4) {
-            mbuf->ol_flags |= RTE_MBUF_F_TX_IP_CKSUM;
-        }
+    /* If L4 checksum offload is requested, IPv4 should be requested as well. */
+    if (mbuf->ol_flags & RTE_MBUF_F_TX_L4_MASK
+        && mbuf->ol_flags & RTE_MBUF_F_TX_IPV4) {
+        mbuf->ol_flags |= RTE_MBUF_F_TX_IP_CKSUM;
     }
+
     return true;
}

2024-03-13T06:05:52.058Z|00025|netdev_dpdk(pmd-c00/id:89)|WARN|tun_port_p0: Tunnel offload: outer_l2_len=18 outer_l3_len=20 l2_len=38 l3_len=20 l4_len=32 2024-03-13T06:05:52.058Z|00026|netdev_dpdk(pmd-c00/id:89)|DBG|tun_port_p0: Tunneled packet: dump mbuf at 0x18eeda200, iova=0x18eeda4c0, buf_len=2176 pkt_len=128, ol_flags=0xc90820000000002, nbsegs=1, port=65535, ptype=0 segment at 0x18eeda200, data=0x18eeda502, len=128, off=66, refcnt=1 Dump data at [0x18eeda502], len=128 00000000: A0 88 C2 20 00 7E B4 96 91 BC 45 7B 81 00 00 5C | ... .~....E{...\ 00000010: 08 00 45 00 00 6E 00 00 40 00 40 11 00 00 0A FD | ..E..n..@.@..... 00000020: 26 37 0A FD 26 3B BA 1F 17 C1 00 5A FF FF 02 40 | &7..&;.....Z...@ 00000030: 65 58 00 00 2D 00 01 02 80 01 00 08 00 04 40 FE | eX..-.........@. 00000040: 95 EF 85 2C 0A 8B BF 77 86 35 08 00 45 00 00 34 | ...,...w.5..E..4 00000050: 00 00 40 00 3F 06 59 AA 0A 00 00 0B 0A 38 CD D7 | ..@.?.Y......8.. 00000060: 00 16 E1 29 11 B7 E7 77 2C 3F 64 5F 80 12 FA F0 | ...)...w,?d.... 00000070: E2 40 00 00 02 04 05 B4 01 01 04 02 01 03 03 09 | .@.............. 2024-03-13T06:05:53.072Z|00027|netdev_dpdk(pmd-c00/id:89)|WARN|tun_port_p0: Tunnel offload: outer_l2_len=18 outer_l3_len=20 l2_len=38 l3_len=20 l4_len=32 2024-03-13T06:05:53.072Z|00028|netdev_dpdk(pmd-c00/id:89)|DBG|tun_port_p0: Tunneled packet: dump mbuf at 0x18eed9640, iova=0x18eed9900, buf_len=2176 pkt_len=128, ol_flags=0xc90820000000002, nbsegs=1, port=65535, ptype=0 segment at 0x18eed9640, data=0x18eed9942, len=128, off=66, refcnt=1 Dump data at [0x18eed9942], len=128 00000000: A0 88 C2 20 00 7E B4 96 91 BC 45 7B 81 00 00 5C | ... .~....E{...\ 00000010: 08 00 45 00 00 6E 00 00 40 00 40 11 00 00 0A FD | ..E..n..@.@..... 00000020: 26 37 0A FD 26 3B BA 1F 17 C1 00 5A FF FF 02 40 | &7..&;.....Z...@ 00000030: 65 58 00 00 2D 00 01 02 80 01 00 08 00 04 40 FE | eX..-.........@. 00000040: 95 EF 85 2C 0A 8B BF 77 86 35 08 00 45 00 00 34 | ...,...w.5..E..4 00000050: 00 00 40 00 3F 06 59 AA 0A 00 00 0B 0A 38 CD D7 | ..@.?.Y......8.. 00000060: 00 16 E1 29 11 B7 E7 77 2C 3F 64 5F 80 12 FA F0 | ...)...w,?d.... 00000070: E2 40 00 00 02 04 05 B4 01 01 04 02 01 03 03 09 | .@.............. 2024-03-13T06:05:54.081Z|00029|netdev_dpdk(pmd-c00/id:89)|WARN|tun_port_p0: Tunnel offload: outer_l2_len=18 outer_l3_len=20 l2_len=38 l3_len=20 l4_len=32 2024-03-13T06:05:54.081Z|00030|netdev_dpdk(pmd-c00/id:89)|DBG|tun_port_p0: Tunneled packet: dump mbuf at 0x18eed8a80, iova=0x18eed8d40, buf_len=2176 pkt_len=128, ol_flags=0xc90820000000002, nbsegs=1, port=65535, ptype=0 segment at 0x18eed8a80, data=0x18eed8d82, len=128, off=66, refcnt=1 Dump data at [0x18eed8d82], len=128 00000000: A0 88 C2 20 00 7E B4 96 91 BC 45 7B 81 00 00 5C | ... .~....E{...\ 00000010: 08 00 45 00 00 6E 00 00 40 00 40 11 00 00 0A FD | ..E..n..@.@..... 00000020: 26 37 0A FD 26 3B BA 1F 17 C1 00 5A FF FF 02 40 | &7..&;.....Z...@ 00000030: 65 58 00 00 2D 00 01 02 80 01 00 08 00 04 40 FE | eX..-.........@. 00000040: 95 EF 85 2C 0A 8B BF 77 86 35 08 00 45 00 00 34 | ...,...w.5..E..4 00000050: 00 00 40 00 3F 06 59 AA 0A 00 00 0B 0A 38 CD D7 | ..@.?.Y......8.. 00000060: 00 16 E1 29 11 B7 E7 77 2C 3F 64 5F 80 12 FA F0 | ...)...w,?d.... 00000070: E2 40 00 00 02 04 05 B4 01 01 04 02 01 03 03 09 | .@.............. 2024-03-13T06:05:55.080Z|00031|netdev_dpdk(pmd-c00/id:89)|DBG|tun_port_p0: Tunneled packet: dump mbuf at 0x18eed7ec0, iova=0x18eed8180, buf_len=2176 pkt_len=128, ol_flags=0xc90820000000002, nbsegs=1, port=65535, ptype=0 segment at 0x18eed7ec0, data=0x18eed81c2, len=128, off=66, refcnt=1 Dump data at [0x18eed81c2], len=128 00000000: A0 88 C2 20 00 7E B4 96 91 BC 45 7B 81 00 00 5C | ... .~....E{...\ 00000010: 08 00 45 00 00 6E 00 00 40 00 40 11 00 00 0A FD | ..E..n..@.@..... 00000020: 26 37 0A FD 26 3B BA 1F 17 C1 00 5A FF FF 02 40 | &7..&;.....Z...@ 00000030: 65 58 00 00 2D 00 01 02 80 01 00 08 00 04 40 FE | eX..-.........@. 00000040: 95 EF 85 2C 0A 8B BF 77 86 35 08 00 45 00 00 34 | ...,...w.5..E..4 00000050: 00 00 40 00 3F 06 59 AA 0A 00 00 0B 0A 38 CD D7 | ..@.?.Y......8.. 00000060: 00 16 E1 29 11 B7 E7 77 2C 3F 64 5F 80 12 FA F0 | ...)...w,?d.... 00000070: E2 40 00 00 02 04 05 B4 01 01 04 02 01 03 03 09 | .@.............. 2024-03-13T06:07:07.088Z|00032|netdev_dpdk(pmd-c00/id:89)|WARN|Dropped 1 log messages in last 72 seconds (most recently, 72 seconds ago) due to excessive rate 2024-03-13T06:07:07.088Z|00033|netdev_dpdk(pmd-c00/id:89)|WARN|tun_port_p0: Tunnel offload: outer_l2_len=18 outer_l3_len=20 l2_len=18 l3_len=20 l4_len=0 2024-03-13T06:07:07.088Z|00034|netdev_dpdk(pmd-c00/id:89)|DBG|tun_port_p0: Tunneled packet: dump mbuf at 0x11d2f6ff00, iova=0x11d2f701c0, buf_len=2176 pkt_len=124, ol_flags=0xc00820000000002, nb_segs=1, port=65535, ptype=0 segment at 0x11d2f6ff00, data=0x11d2f70240, len=124, off=128, refcnt=1 Dump data at [0x11d2f70240], len=124 00000000: 40 A6 B7 21 92 8C B4 96 91 BC 45 7B 81 00 00 5C | @..!......E{...\ 00000010: 08 00 45 00 00 6A 00 00 40 00 40 11 00 00 0A FD | ..E..j..@.@..... 00000020: 26 37 0A FD 26 36 AE 80 17 C1 00 56 FF FF 02 40 | &7..&6.....V...@ 00000030: 65 58 00 00 31 00 01 02 80 01 00 05 80 00 33 33 | eX..1.........33 00000040: 00 00 00 02 06 D8 CE 6A 6F 48 86 DD 60 00 97 93 | .......joH..... 00000050: 00 08 3A FF FE 80 00 00 00 00 00 00 25 8F 09 39 | ..:.........%..9 00000060: 36 02 3D 47 FF 02 00 00 00 00 00 00 00 00 00 00 | 6.=G............ 00000070: 00 00 00 02 85 00 DB 25 00 00 00 00 | .......%.... 2024-03-13T06:07:07.088Z|00035|netdev_dpdk(pmd-c00/id:89)|WARN|tun_port_p0: Tunnel offload: outer_l2_len=18 outer_l3_len=20 l2_len=18 l3_len=20 l4_len=0 2024-03-13T06:07:07.088Z|00036|netdev_dpdk(pmd-c00/id:89)|DBG|tun_port_p0: Tunneled packet: dump mbuf at 0x11d2f70ac0, iova=0x11d2f70d80, buf_len=2176 pkt_len=124, ol_flags=0xc00820000000002, nb_segs=1, port=65535, ptype=0 segment at 0x11d2f70ac0, data=0x11d2f70e00, len=124, off=128, refcnt=1 Dump data at [0x11d2f70e00], len=124 00000000: 68 91 D0 65 C6 C3 B4 96 91 BC 45 7B 81 00 00 5C | h..e......E{...\ 00000010: 08 00 45 00 00 6A 00 00 40 00 40 11 00 00 0A FD | ..E..j..@.@..... 00000020: 26 37 0A FD 26 38 AE 80 17 C1 00 56 FF FF 02 40 | &7..&8.....V...@ 00000030: 65 58 00 00 31 00 01 02 80 01 00 05 80 00 33 33 | eX..1.........33 00000040: 00 00 00 02 06 D8 CE 6A 6F 48 86 DD 60 00 97 93 | .......joH..... 00000050: 00 08 3A FF FE 80 00 00 00 00 00 00 25 8F 09 39 | ..:.........%..9 00000060: 36 02 3D 47 FF 02 00 00 00 00 00 00 00 00 00 00 | 6.=G............ 00000070: 00 00 00 02 85 00 DB 25 00 00 00 00 | .......%.... 2024-03-13T06:07:07.088Z|00037|netdev_dpdk(pmd-c00/id:89)|WARN|tun_port_p0: Tunnel offload: outer_l2_len=18 outer_l3_len=20 l2_len=18 l3_len=20 l4_len=0 2024-03-13T06:07:07.088Z|00038|netdev_dpdk(pmd-c00/id:89)|DBG|tun_port_p0: Tunneled packet: dump mbuf at 0x11d2f71680, iova=0x11d2f71940, buf_len=2176 pkt_len=124, ol_flags=0xc00820000000002, nb_segs=1, port=65535, ptype=0 segment at 0x11d2f71680, data=0x11d2f719c0, len=124, off=128, refcnt=1 Dump data at [0x11d2f719c0], len=124 00000000: 08 C0 EB AF 0D 3F B4 96 91 BC 45 7B 81 00 00 5C | .....?....E{...\ 00000010: 08 00 45 00 00 6A 00 00 40 00 40 11 00 00 0A FD | ..E..j..@.@..... 00000020: 26 37 0A FD 26 32 AE 80 17 C1 00 56 FF FF 02 40 | &7..&2.....V...@ 00000030: 65 58 00 00 31 00 01 02 80 01 00 05 80 00 33 33 | eX..1.........33 00000040: 00 00 00 02 06 D8 CE 6A 6F 48 86 DD 60 00 97 93 | .......joH..... 00000050: 00 08 3A FF FE 80 00 00 00 00 00 00 25 8F 09 39 | ..:.........%..9 00000060: 36 02 3D 47 FF 02 00 00 00 00 00 00 00 00 00 00 | 6.=G............ 00000070: 00 00 00 02 85 00 DB 25 00 00 00 00 | .......%.... 2024-03-13T06:07:07.088Z|00039|netdev_dpdk(pmd-c00/id:89)|WARN|tun_port_p0: Tunnel offload: outer_l2_len=18 outer_l3_len=20 l2_len=0 l3_len=0 l4_len=0 2024-03-13T06:07:07.088Z|00040|netdev_dpdk(pmd-c00/id:89)|DBG|tun_port_p0: Tunneled packet: dump mbuf at 0x18eed7300, iova=0x18eed75c0, buf_len=2176 pkt_len=124, ol_flags=0xc00820000000002, nb_segs=1, port=65535, ptype=0 segment at 0x18eed7300, data=0x18eed7602, len=124, off=66, refcnt=1 Dump data at [0x18eed7602], len=124 00000000: 6C FE 54 2F 0D C0 B4 96 91 BC 45 7B 81 00 00 5C | l.T/......E{...\ 00000010: 08 00 45 00 00 6A 00 00 40 00 40 11 00 00 0A FD | ..E..j..@.@..... 00000020: 26 37 0A FD 26 39 AE 80 17 C1 00 56 FF FF 02 40 | &7..&9.....V...@ 00000030: 65 58 00 00 31 00 01 02 80 01 00 05 80 00 33 33 | eX..1.........33 00000040: 00 00 00 02 06 D8 CE 6A 6F 48 86 DD 60 00 97 93 | .......joH..... 00000050: 00 08 3A FF FE 80 00 00 00 00 00 00 25 8F 09 39 | ..:.........%..9 00000060: 36 02 3D 47 FF 02 00 00 00 00 00 00 00 00 00 00 | 6.=G............ 00000070: 00 00 00 02 85 00 DB 25 00 00 00 00 | .......%.... 2024-03-13T06:07:07.088Z|00010|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event

wangjun0728 commented 4 months ago

Hi, are there any other modification suggestions for the E810 network card, or do you need to modify it from the ice driver?

david-marchand commented 4 months ago

(jumping in the thread) A MDD event can be associated with a "wrong" (from the hw pov) Tx descriptor.

Could you please set --log-level=pmd.net.ice.*:debug ?

wangjun0728 commented 4 months ago

I captured packets on the E810 sender because it supports inner and outer layer offloading, and the packets look normal. image

Then, when I captured packets on the receiving end and inspected them, I found that the outer UDP checksum was incorrect. I suspect this might be causing the issue. The network card I'm using supports outer checksum offload, but the actual packets don't seem to have undergone outer layer offloading. image

Furthermore, I applied this patch, but it had no effect; the issue still persists. @david-marchand https://git.dpdk.org/dpdk/commit?id=daac90272857812b3da1db95caf5922f03a83343

wangjun0728 commented 4 months ago

After disabling the outer UDP checksum offload of the E810 network card, I verified that network communication on my end was normal. However, the 'MDD event' still persists, although it has resolved the issue I was experiencing. I believe this might be due to the DPDK ice driver not supporting outer UDP checksum offload but still enabling the flag, causing this issue. This modification can be considered a temporary step back, awaiting resolution from the DPDK ice driver before re-enabling the feature. @igsilya CC @mkp-rh @david-marchand

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index ea18eeb2d6ee1fb8bf9d9bedb95416db4daf5b99..fa8af37cd451576060a24514506ce66a365a4be9 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -1364,6 +1364,12 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev)
         info.tx_offload_capa &= ~RTE_ETH_TX_OFFLOAD_TCP_CKSUM;
     }

+    if (!strcmp(info.driver_name, "net_ice")) {
+        VLOG_INFO("%s: disabled Tx outer udp checksum offloads for a net/ice port.",
+                  netdev_get_name(&dev->up));
+        info.tx_offload_capa &= ~RTE_ETH_TX_OFFLOAD_OUTER_UDP_CKSUM;
+    }
+
     if (info.tx_offload_capa & RTE_ETH_TX_OFFLOAD_IPV4_CKSUM) {
         dev->hw_ol_features |= NETDEV_TX_IPV4_CKSUM_OFFLOAD;
     } else {
igsilya commented 4 months ago

@wangjun0728 thanks for extra testing!

I think, it is reasonable to disable offloading for this driver for now. Do you want to send a proper patch to dev@openvswitch.org (contributing guide )? If not, I can pick up the change and send it myself.

wangjun0728 commented 4 months ago

Hi, it is normal to use geneve overlay without turning on userspace-tso-enable. However, if I configure userspace-tso-enable=true, there will be traffic when iperf sends tcp messages and it cannot send a large amount of traffic. . And I captured the packet at the receiving end and checked, and there was indeed an outer udp length exception. And only when sending TCP packets, there will be exceptions. If you use UDP packets to send packets, it will be normal. @igsilya

Additionally, I applied this patch:https://patchwork.ozlabs.org/project/openvswitch/patch/20240221040855.271921-1-mkp@redhat.com/ CC @mkp-rh

TCP: image image UDP: image

wangjun0728 commented 4 months ago

When this patch is not applied, the packet capture phenomenon has the same effect. TCP cannot transmit a large amount of traffic. https://patchwork.ozlabs.org/project/openvswitch/patch/20240221040855.271921-1-mkp@redhat.com/ image

wangjun0728 commented 4 months ago

(jumping in the thread) A MDD event can be associated with a "wrong" (from the hw pov) Tx descriptor.

  • I suspect the vector tx handler does not support tunneling offload, but I am not sure (looking at the logs in this thread) which handler has been selected by the net/ice driver.

Could you please set --log-level=pmd.net.ice.*:debug ?

set --log-level=pmd.net.ice.*:debug dpdk.log

igsilya commented 4 months ago

@wangjun0728 The incorrect length in the outer UDP header might be a bug in https://patchwork.ozlabs.org/project/openvswitch/patch/20240221040855.271921-1-mkp@redhat.com/ . The patch wasn't reviewed yet, and on a quick glance it might indeed be missing the update for the outer UDP header.

You need a card capable of Tunnel TSO in order to have good performance. The userspace fallback implemented in the patch will not be very fast even if the UDP length is fixed, because it performs way too many operations including large memory copies. The case without this patch is likely just dropping large packets that iperf is trying to send, so TCP stack is trying to adjust for the maximum packet size it can actually send and that reflects in the very bad performance. TCP suffers much harder than UDP, because UDP just fragments packets on the sender as we do not advertise support for UFO.

The card capable of Tunnel TSO in your case is E810, but you disabled outer checksum offload, so Tunnel TSO will not work.

wangjun0728 commented 4 months ago

Because the network cards I'm using, E810/82599/CX5, do not support tx_out_udp_csum_offload, can I not enable userspace-tso-enable?

igsilya commented 4 months ago

Because the network cards I'm using, E810/82599/CX5, do not support tx_out_udp_csum_offload, can I not enable userspace-tso-enable?

In the current state of OVS development all the tunneled traffic will be dropped: https://github.com/openvswitch/ovs/blob/9d0a40120f9f71ed9ddf32d37d1b03b0fd7f4703/lib/netdev.c#L917-L932

The patch from @mkp-rh that you mentioned will add support for segmenting packets in software before sending them out in this case, i.e. not just dropping them, but it is not going to be very fast, so it might be faster to just let the sender segment packets before sending them out, but I didn't test.

wangjun0728 commented 4 months ago

Hi,I use "ovs-appctl coverage/read-counter netdev_geneve_tso_drops" to check the value, which is zero, and there are no related error logs printed either. Next, I will try with X710 to see if its DPDK driver supports the outer UDP checksum and tunnel TSO capabilities.

igsilya commented 4 months ago

Hi,I use "ovs-appctl coverage/read-counter netdev_geneve_tso_drops" to check the value, which is zero, and there are no related error logs printed either.

This is on E810, right? I think, we need to extend your patch to not only disable RTE_ETH_TX_OFFLOAD_OUTER_UDP_CKSUM, but also disable all the dependent offloads like RTE_ETH_TX_OFFLOAD_VXLAN_TNL_TSO and RTE_ETH_TX_OFFLOAD_GENEVE_TNL_TSO. With that you should see the log and the counter. Devices should not advertise tunnel TSO if they do not support outer checksums.

wangjun0728 commented 4 months ago

Hi,I use "ovs-appctl coverage/read-counter netdev_geneve_tso_drops" to check the value, which is zero, and there are no related error logs printed either.

This is on E810, right? I think, we need to extend your patch to not only disable RTE_ETH_TX_OFFLOAD_OUTER_UDP_CKSUM, but also disable all the dependent offloads like RTE_ETH_TX_OFFLOAD_VXLAN_TNL_TSO and RTE_ETH_TX_OFFLOAD_GENEVE_TNL_TSO. With that you should see the log and the counter. Devices should not advertise tunnel TSO if they do not support outer checksums.

I understand your point, but currently, when TSO is enabled, the values for E810 tx_geneve_tso and tx_vxlan_tso_offload are indeed disabled. So, my validation result is based on the situation where tx_out_udp_csum_offload/tx_geneve_tso/tx_vxlan_tso_offload are disabled. However, your suggestion to explicitly disable them in the code is also reasonable, and I will make the necessary modifications accordingly.

   ovs-vsctl get open . other_config
  {bundle-idle-timeout="3600", dpdk-extra=" -a 0000:af:00.1 -a 0000:af:00.0", dpdk-init="true", dpdk-socket-mem="2048", n-handler-threads="1", pmd-cpu-mask="0xf", userspace-tso-enable="true", vlan-limit="0"}

  ovs-vsctl get interface  tun_port_p0 status 
  {bus_info="bus_name=pci, vendor_id=8086, device_id=159b", driver_name=net_ice, if_descr="DPDK 23.11.0 net_ice", if_type="6", link_speed="25Gbps", max_hash_mac_addrs="0", max_mac_addrs="64", max_rx_pktlen="1618", max_rx_queues="256", max_tx_queues="256", max_vfs="0", max_vmdq_pools="0", min_rx_bufsize="1024", n_rxq="2", n_txq="5", numa_id="1", port_no="1", rx-steering=rss, rx_csum_offload="true", tx_geneve_tso_offload="false", tx_ip_csum_offload="true", tx_out_ip_csum_offload="true", tx_out_udp_csum_offload="false", tx_sctp_csum_offload="true", tx_tcp_csum_offload="true", tx_tcp_seg_offload="true", tx_udp_csum_offload="true", tx_vxlan_tso_offload="false"}
wangjun0728 commented 4 months ago

For X710, there is a similar issue to E810. Although tx_out_udp_csum_offload is true, I observed incorrect outer UDP checksums when inspecting packets at the receiving end. I have made some modifications and revalidated the issue, and it now appears to be resolved. I will submit a patch shortly. image

wangjun0728 commented 4 months ago

From what I can see in the DPDK code, it appears that the i40e driver does not handle the outer UDP checksum logic. https://github.com/DPDK/dpdk/blob/main/drivers/net/i40e/i40e_rxtx.c#L301

igsilya commented 4 months ago

@wangjun0728 thanks. Yeah, it looks like it just advertises the feature, but doesn't do anything about it... Could you open a bug on https://bugs.dpdk.org for both i40e and ice, if you didn't already?

wangjun0728 commented 4 months ago

@igsilya Thank you very much. I have filed a bug and will support tracking it. https://bugs.dpdk.org/show_bug.cgi?id=1406

igsilya commented 4 months ago

@wangjun0728 Thanks!

For the issue where TCP does not work well while UDP works. It's still a bit puzzling, but maybe related to this: https://mail.openvswitch.org/pipermail/ovs-discuss/2024-March/053015.html ?

Could you show the output of ovs-appctl dpctl/dump-flows while TCP traffic is (not) flowing? Specifically, I'm interested in the flow that performs tnl_push action.

wangjun0728 commented 4 months ago

@wangjun0728 Thanks!

For the issue where TCP does not work well while UDP works. It's still a bit puzzling, but maybe related to this: https://mail.openvswitch.org/pipermail/ovs-discuss/2024-March/053015.html ?

Could you show the output of ovs-appctl dpctl/dump-flows while TCP traffic is (not) flowing? Specifically, I'm interested in the flow that performs tnl_push action.

I think you are right. The discussion you posted may be related to the issue I encountered. Here is the flow information when I use iperf to send TCP traffic after enabling TSO. Additionally, I have disabled tx_geneve_tso_offload/tx_vxlan_tso_offload/tx_out_udp_csum_offload.

    [root@compute]# ovs-appctl dpctl/dump-flows
    flow-dump from pmd on cpu core: 3
    recirc_id(0xf0),tunnel(tun_id=0x31,src=10.253.38.54,dst=10.253.38.55,geneve({}),flags(-df+csum+key)),in_port(4),ct_state(-new+est-rel+rpl-inv+trk),ct_label(0/0x1),packet_type(ns=0,id=0),eth(dst=0e:a0:1b:9e:ca:04/01:00:00:00:00:00),eth_type(0x0800),ipv4(frag=no), packets:2390, bytes:138104, used:0.167s, flags:., actions:7
    recirc_id(0),tunnel(tun_id=0x31,src=10.253.38.54,dst=10.253.38.55,geneve({class=0x102,type=0x80,len=4,0x60005}),flags(-df+csum+key)),in_port(4),skb_mark(0/0x4),ct_state(-trk),packet_type(ns=0,id=0),eth(src=0a:c8:e1:5c:84:0e,dst=0e:a0:1b:9e:ca:04/01:00:00:00:00:00),eth_type(0x0800),ipv4(src=10.0.0.5/128.0.0.0,proto=6,frag=no), packets:2390, bytes:138104, used:0.167s, flags:., actions:ct(zone=10),recirc(0xf0)
    recirc_id(0),in_port(7),packet_type(ns=0,id=0),eth(src=0e:a0:1b:9e:ca:04/01:00:00:00:00:00,dst=0a:c8:e1:5c:84:0e),eth_type(0x0800),ipv4(src=10.0.0.3/128.0.0.0,dst=10.0.0.5/128.0.0.0,proto=6,frag=no), packets:4611, bytes:7858514, used:0.167s, flags:P., actions:ct(zone=10),recirc(0xed)
    recirc_id(0),in_port(3),packet_type(ns=0,id=0),eth(src=40:a6:b7:21:92:8c,dst=6c:fe:54:2f:7e:b0),eth_type(0x8100),vlan(vid=92,pcp=0),encap(eth_type(0x0800),ipv4(dst=10.253.38.55,proto=17,frag=no),udp(dst=6081)), packets:2390, bytes:286284, used:0.167s, actions:pop_vlan,tnl_pop(4)
    recirc_id(0xed),in_port(7),skb_mark(0),ct_state(-new+est-rel-rpl-inv+trk),ct_label(0/0x1),packet_type(ns=0,id=0),eth(src=0e:a0:1b:9e:ca:04,dst=0a:c8:e1:5c:84:0e),eth_type(0x0800),ipv4(dst=10.0.0.5/255.255.255.252,tos=0/0x3,frag=no), packets:4611, bytes:7858514, used:0.167s, flags:P., actions:set(skb_mark(0x1)),tnl_push(tnl_port(4),header(size=58,type=5,eth(dst=40:a6:b7:21:92:8c,src=6c:fe:54:2f:7e:b0,dl_type=0x0800),ipv4(src=10.253.38.55,dst=10.253.38.54,proto=17,tos=0,ttl=64,frag=0x4000),udp(src=0,dst=6081,csum=0xffff),geneve(crit,vni=0x31,options({class=0x102,type=0x80,len=4,0x50006}))),out_port(1)),push_vlan(vid=92,pcp=0),lb_output(2)
    flow-dump from pmd on cpu core: 2
    recirc_id(0xed),in_port(7),skb_mark(0),ct_state(-new+est-rel-rpl-inv+trk),ct_label(0/0x1),packet_type(ns=0,id=0),eth(src=0e:a0:1b:9e:ca:04,dst=0a:c8:e1:5c:84:0e),eth_type(0x0800),ipv4(dst=10.0.0.5/255.255.255.252,tos=0/0x3,frag=no), packets:118, bytes:196172, used:0.435s, flags:P., actions:set(skb_mark(0x1)),tnl_push(tnl_port(4),header(size=58,type=5,eth(dst=40:a6:b7:21:92:8c,src=6c:fe:54:2f:7e:b0,dl_type=0x0800),ipv4(src=10.253.38.55,dst=10.253.38.54,proto=17,tos=0,ttl=64,frag=0x4000),udp(src=0,dst=6081,csum=0xffff),geneve(crit,vni=0x31,options({class=0x102,type=0x80,len=4,0x50006}))),out_port(1)),push_vlan(vid=92,pcp=0),lb_output(2)
    recirc_id(0),in_port(7),packet_type(ns=0,id=0),eth(src=0e:a0:1b:9e:ca:04/01:00:00:00:00:00,dst=0a:c8:e1:5c:84:0e),eth_type(0x0800),ipv4(src=10.0.0.3/128.0.0.0,dst=10.0.0.5/128.0.0.0,proto=6,frag=no), packets:118, bytes:196172, used:0.435s, flags:P., actions:ct(zone=10),recirc(0xed)
    recirc_id(0),in_port(3),packet_type(ns=0,id=0),eth(src=40:a6:b7:23:6a:90,dst=ff:ff:ff:ff:ff:ff),eth_type(0x8100),vlan(vid=91),encap(eth_type(0x0806),arp(sip=10.253.38.27,tip=10.253.38.27,op=1)), packets:0, bytes:0, used:never, actions:drop
    recirc_id(0),in_port(3),packet_type(ns=0,id=0),eth(src=40:a6:b7:23:6a:90,dst=ff:ff:ff:ff:ff:ff),eth_type(0x8100),vlan(vid=91),encap(eth_type(0x0806),arp(sip=10.253.38.26,tip=10.253.38.26,op=1)), packets:0, bytes:0, used:never, actions:drop
    recirc_id(0),in_port(3),packet_type(ns=0,id=0),eth(src=40:a6:b7:23:6a:90,dst=ff:ff:ff:ff:ff:ff),eth_type(0x8100),vlan(vid=91),encap(eth_type(0x0806),arp(sip=10.253.38.25,tip=10.253.38.25,op=1)), packets:0, bytes:0, used:never, actions:drop

    [root@compute]# ovs-vsctl get interface tun_port_p0 status 
    {bus_info="bus_name=pci, vendor_id=8086, device_id=1572", driver_name=net_i40e, if_descr="DPDK 23.11.0 net_i40e", if_type="6", link_speed="10Gbps", max_hash_mac_addrs="0", max_mac_addrs="64", max_rx_pktlen="1618", max_rx_queues="320", max_tx_queues="320", max_vfs="0", max_vmdq_pools="64", min_rx_bufsize="1024", n_rxq="2", n_txq="5", numa_id="0", port_no="0", rx-steering=rss, rx_csum_offload="true", tx_geneve_tso_offload="false", tx_ip_csum_offload="true", tx_out_ip_csum_offload="true", tx_out_udp_csum_offload="false", tx_sctp_csum_offload="true", tx_tcp_csum_offload="true", tx_tcp_seg_offload="true", tx_udp_csum_offload="true", tx_vxlan_tso_offload="false"}