xdp-project / xdp-tutorial

XDP tutorial
2.35k stars 566 forks source link

advanced03-AF_XDP: performance #110

Open simonhf opened 4 years ago

simonhf commented 4 years ago

So using the tutorial code [1] and the one-liner XDP program which passes all packets through, and the af_xdp_user.c code which just loops through the received packets and then frees the UMEM frame for XDP to use again -- i.e. almost no business logic and the tightest loop in user land dealing with the packets, with no packet inspection, re-writing, or sending -- and using the udpsender code from this article [2], I got the following performance results using veth:

| udpsenders |  CPU | poll |   mode |  PPS |  CPU |
|          1 | 100% |   no |    skb | 995k | 100% |
|          2 | 200% |   no |    skb | 1.8M | 100% |
|          3 | 300% |   no |    skb | 2.5M | 100% |
|          4 | 400% |   no |    skb | 3.2M | 100% |
|          5 | 500% |   no |    skb | 3.7M | 100% |
|          5 | 500% |  yes |    skb | 2.8M |  92% |
|          1 | 100% |   no | native | 830k | 100% |
|          2 | 123% |   no | native | 1.3M | 100% |
|          1 | 100% |   no |   auto | 815k | 100% |
|          2 | 123% |   no |   auto | 1.3M | 100% |
|          1 | 100% |  yes |   auto | 650k |  84% |
|          2 | 126% |  yes |   auto | 980k |  39% |

I monitored the CPU usage with the following command:

top -d 1 -b | egrep "udpsender|af_xdp_user"

I ran af_xdp_user with the following command variations:

$ make && sudo timeout --signal=SIGINT 9 taskset -c 7 ./af_xdp_user --dev veth-advanced03 --filename af_xdp_kern.o --skb-mode              --force & sudo ip netns exec veth-advanced03 sh -c "(sleep 1 ; timeout --signal=SIGINT 8 taskset -c 1         ./udpsender 10.11.11.1:4321                                                                )"
$ make && sudo timeout --signal=SIGINT 9 taskset -c 7 ./af_xdp_user --dev veth-advanced03 --filename af_xdp_kern.o --skb-mode              --force & sudo ip netns exec veth-advanced03 sh -c "(sleep 1 ; timeout --signal=SIGINT 8 taskset -c 1,2       ./udpsender 10.11.11.1:4321 10.11.11.1:4321                                                )"
$ make && sudo timeout --signal=SIGINT 9 taskset -c 7 ./af_xdp_user --dev veth-advanced03 --filename af_xdp_kern.o --skb-mode              --force & sudo ip netns exec veth-advanced03 sh -c "(sleep 1 ; timeout --signal=SIGINT 8 taskset -c 1,2,3     ./udpsender 10.11.11.1:4321 10.11.11.1:4321 10.11.11.1:4321                                )"
$ make && sudo timeout --signal=SIGINT 9 taskset -c 7 ./af_xdp_user --dev veth-advanced03 --filename af_xdp_kern.o --skb-mode              --force & sudo ip netns exec veth-advanced03 sh -c "(sleep 1 ; timeout --signal=SIGINT 8 taskset -c 1,2,3,4   ./udpsender 10.11.11.1:4321 10.11.11.1:4321 10.11.11.1:4321 10.11.11.1:4321                )"
$ make && sudo timeout --signal=SIGINT 9 taskset -c 7 ./af_xdp_user --dev veth-advanced03 --filename af_xdp_kern.o --skb-mode              --force & sudo ip netns exec veth-advanced03 sh -c "(sleep 1 ; timeout --signal=SIGINT 8 taskset -c 1,2,3,4,5 ./udpsender 10.11.11.1:4321 10.11.11.1:4321 10.11.11.1:4321 10.11.11.1:4321 10.11.11.1:4321)"
$ make && sudo timeout --signal=SIGINT 9 taskset -c 7 ./af_xdp_user --dev veth-advanced03 --filename af_xdp_kern.o --skb-mode  --poll-mode --force & sudo ip netns exec veth-advanced03 sh -c "(sleep 1 ; timeout --signal=SIGINT 8 taskset -c 1,2,3,4,5 ./udpsender 10.11.11.1:4321 10.11.11.1:4321 10.11.11.1:4321 10.11.11.1:4321 10.11.11.1:4321)"
$ make && sudo timeout --signal=SIGINT 9 taskset -c 7 ./af_xdp_user --dev veth-advanced03 --filename af_xdp_kern.o --native-mode           --force & sudo ip netns exec veth-advanced03 sh -c "(sleep 1 ; timeout --signal=SIGINT 8 taskset -c 1         ./udpsender 10.11.11.1:4321                                                                )"
$ make && sudo timeout --signal=SIGINT 9 taskset -c 7 ./af_xdp_user --dev veth-advanced03 --filename af_xdp_kern.o --native-mode           --force & sudo ip netns exec veth-advanced03 sh -c "(sleep 1 ; timeout --signal=SIGINT 8 taskset -c 1,2       ./udpsender 10.11.11.1:4321 10.11.11.1:4321                                                )"
$ make && sudo timeout --signal=SIGINT 9 taskset -c 7 ./af_xdp_user --dev veth-advanced03 --filename af_xdp_kern.o --auto-mode             --force & sudo ip netns exec veth-advanced03 sh -c "(sleep 1 ; timeout --signal=SIGINT 8 taskset -c 1         ./udpsender 10.11.11.1:4321                                                                )"
$ make && sudo timeout --signal=SIGINT 9 taskset -c 7 ./af_xdp_user --dev veth-advanced03 --filename af_xdp_kern.o --auto-mode             --force & sudo ip netns exec veth-advanced03 sh -c "(sleep 1 ; timeout --signal=SIGINT 8 taskset -c 1,2       ./udpsender 10.11.11.1:4321 10.11.11.1:4321                                                )"
$ make && sudo timeout --signal=SIGINT 9 taskset -c 7 ./af_xdp_user --dev veth-advanced03 --filename af_xdp_kern.o --auto-mode --poll-mode --force & sudo ip netns exec veth-advanced03 sh -c "(sleep 1 ; timeout --signal=SIGINT 8 taskset -c 1         ./udpsender 10.11.11.1:4321                                                                )"
$ make && sudo timeout --signal=SIGINT 9 taskset -c 7 ./af_xdp_user --dev veth-advanced03 --filename af_xdp_kern.o --auto-mode --poll-mode --force & sudo ip netns exec veth-advanced03 sh -c "(sleep 1 ; timeout --signal=SIGINT 8 taskset -c 1,2       ./udpsender 10.11.11.1:4321 10.11.11.1:4321                                                )"

Note: udpsender causes 74 byte packets to be received. Note: Increased threads for udpsender until no PPS improvement was seen. Note: The veth is configured with the default single receive queue. Note: The last column is the CPU usage of af_xdp_user.

Questions:

[1] https://github.com/xdp-project/xdp-tutorial/tree/master/advanced03-AF_XDP [2] https://blog.cloudflare.com/how-to-receive-a-million-packets/

tohojo commented 4 years ago

Simon Hardy-Francis notifications@github.com writes:

Questions:

  • The best PPS above is 3.7M. Is there anything I can tweak with af_xdp_user.c or the environment to get better PPS results?

  • Should veth be considered faster than hardware NICs or might a hardware NIC (with equivalent single receive queue) be faster? And why?

XDP gets its high performance by bypassing the networking stack. But when you're using veths, things go through the stack anyway, so you're not really getting the benefits. So I'm not sure how much sense it makes to benchmark this...

  • Why would using poll not result in 100% CPU usage for af_xdp_user regardless of how many udpsender threads are used?
  • Why would using poll with two senders halve CPU usage for af_xdp_user while increasing PPS? This seems counter-intuitive, or?

IDK, scheduling issues?

  • According to the above results, it seems that when testing performance with veth then skb mode is much faster than native mode, and auto mode chooses native mode. Why is skb mode faster? And if veth is the reason, couldn't auto mode somehow detect that you choose skb?

'auto' just means 'try native mode, and if it fails fall-back to skb-mode'. Since veth always builds skbs anyway, 'native' mode just adds overhead (unless the packet was redirected from a physical nic). But that's a specific 'feature' of veth, so not generally applicable...