Closed git-liusen closed 13 hours ago
Others can probably provide more authoritative answers, but I believe that regarding the disabling of rx/tx checksum offload, the basic answer is as follows:
If a NIC driver tells the Linux kernel that rx and tx checksum offload are enabled, then the Linux kernel saves some CPU cycles while processing each packet, because the NIC driver is telling the kernel "you don't have to calculate these checksums, because the NIC will do them for you".
If a NIC driver tells the Linux kernel that rx and tx checksum offload are disabled, then the Linux kernel goes to the extra effort of calculating TCP and UDP checksums itself for each such packet. The extra computation is not terribly large -- it becomes most noticeable at higher network data rates, which should not be an issue in your testing.
I believe that with the virtual NICs used in the kind of setup that you have, e.g. veth pairs, the veth implementation does not implement these checksum offload features. So if the driver tells the Linux kernel that rx/tx offload are enabled, that is actually incorrect, they are not enabled. It is more truthful to disable them, so that the Linux kernel will calculate these checksums.
I do not know the reason for the high latency and retransmissions in your setup. Have you tried also disable rx/tx checksum offload for the interfaces in the VM where the BMv2 simple_switch
or simple_switch_grpc
process is running?
I would recommend trying to disable scatter-gather (sg
) on enp7s0
as well. After that you can try capturing the traffic at each interface to see if an issue shows up.
Do the packets processed by bmv2 pass through the network protocol stack? Can bmv2 use dpdk?
No and no
But I used a bridge instead of BMv2 in the virtual machine to connect ENP7S0 and ENP8S0 together, and they can communicate normally without turning off the checksum offloading function, and there is no retransmission of packets. I tried to disable the tx and rx verification and uninstallation functions of the virtual machine where Simple_Switch_gRPC is located, and also disable sg, but it did not solve the problem.
The following figure shows the communication status when connected through a bridge
But I used a bridge instead of BMv2 in the virtual machine to connect ENP7S0 and ENP8S0 together
That's comparing apples to oranges. When you use a bridge, the traffic is handled by the Linux kernel. When you use the bmv2, all packets are sent to a userspace process (simple_switch_grpc
) using raw sockets.
Antonin (or anyone reading this who knows), I know that there is a reliable way to see the full contents of any packet received or transmitted by the BMv2 software switch. Just add a command line option like this to the simple_switch
or simple_switch_grpc
command line: --dump-packet-data 10000
(the 10000 is the maximum number of bytes of each packet to print in the log).
I know you can use tcpdump or wireshark on veth interfaces to see packets going across them, but it is not clear to me when you do that whether the packet contents are shown before or after checksum calculations are done in the kernel (if they are done in the kernel at all, which they will not be if the NIC tx checksum offloading is enabled).
Having a reliable way to know the contents of the packet at multiple places along the path in a scenario like the one described in this issue would go a long way to understanding if checksumming is the problem.
Note: Even if the checksums of such packets are questionable, the presence or absence of packets shown by tcpdump/wireshark for a veth interface should be 100% accurate, at least when the packet rates are low enough that the CPU load is low.
Given this topology: enp7s0--bmv2--enp8s0
, I would run a separate packet capture on both of these virtual interfaces to see if anything interesting shows up.
The checksum settings shouldn't really matter on enp7s0
and enp8s0
, given that bmv2 uses raw sockets.
sudo ethtool -K enp7s0 gro off lro off
sudo ethtool -K enp8s0 gro off lro off
I found the problem of packet retransmission. Because of the automatic fragmentation reassembly function of the network card, the data packet received by the switch exceeds the MTU, and the packet is lost, resulting in retransmission.
Thank you so much
Environment: three kvm virtual machines
So I have the following questions
I see there is a topic for checksum offload.[#1186 ] What should I do to achieve a normal network connection? I'm looking forward to your reply!
Below is my p4 code: