Open simonhf opened 4 years ago
Simon Hardy-Francis notifications@github.com writes:
Questions:
The best PPS above is 3.7M. Is there anything I can tweak with
af_xdp_user.c
or the environment to get better PPS results?Should veth be considered faster than hardware NICs or might a hardware NIC (with equivalent single receive queue) be faster? And why?
XDP gets its high performance by bypassing the networking stack. But when you're using veths, things go through the stack anyway, so you're not really getting the benefits. So I'm not sure how much sense it makes to benchmark this...
- Why would using poll not result in 100% CPU usage for
af_xdp_user
regardless of how manyudpsender
threads are used?- Why would using poll with two senders halve CPU usage for
af_xdp_user
while increasing PPS? This seems counter-intuitive, or?
IDK, scheduling issues?
- According to the above results, it seems that when testing performance with veth then skb mode is much faster than native mode, and auto mode chooses native mode. Why is skb mode faster? And if veth is the reason, couldn't auto mode somehow detect that you choose skb?
'auto' just means 'try native mode, and if it fails fall-back to skb-mode'. Since veth always builds skbs anyway, 'native' mode just adds overhead (unless the packet was redirected from a physical nic). But that's a specific 'feature' of veth, so not generally applicable...
So using the tutorial code [1] and the one-liner XDP program which passes all packets through, and the
af_xdp_user.c
code which just loops through the received packets and then frees the UMEM frame for XDP to use again -- i.e. almost no business logic and the tightest loop in user land dealing with the packets, with no packet inspection, re-writing, or sending -- and using theudpsender
code from this article [2], I got the following performance results using veth:I monitored the CPU usage with the following command:
I ran
af_xdp_user
with the following command variations:Note:
udpsender
causes 74 byte packets to be received. Note: Increased threads forudpsender
until no PPS improvement was seen. Note: The veth is configured with the default single receive queue. Note: The last column is the CPU usage ofaf_xdp_user
.Questions:
af_xdp_user.c
or the environment to get better PPS results?af_xdp_user
regardless of how manyudpsender
threads are used?af_xdp_user
while increasing PPS? This seems counter-intuitive, or?[1] https://github.com/xdp-project/xdp-tutorial/tree/master/advanced03-AF_XDP [2] https://blog.cloudflare.com/how-to-receive-a-million-packets/