Open simonhf opened 4 years ago
@magnus-karlsson @bjoto any thoughts? :)
When you get packet loss, it is usually because the driver has not gotten enough buffers to work with from user space. Try increasing the fill ring size and be more aggressive in putting buffers in this ring. Fill it up to its maximum. No reason to be shy :-).
Is this simply the case that the tutorial default of 4,096 frames is not enough?
I tried increasing the number of frames from 4,096 to 131,072... which times 4KB is 512 MB UMEM in total. But seems to make no difference to the packet loss. Any ideas?
Increasing the memory for packet buffers does not change the number available for the driver to full. You might want to also increase the *___DEFAULT_NUM_DESCS values and see if it gets better.
@magnus-karlsson Thanks very much for commenting!
"When you get packet loss, it is usually because the driver has not gotten enough buffers to work with from user space."
In this case, tcpreplay is sending the packets to 'veth-advanced03', and at the end of the reproduction steps above, ifconfig reports that all packets were received by veth-advanced03 and that there were zero "RX dropped". If the driver receive buffer were too small, wouldn't it be this ifconfig "RX dropped" counter which gets incremented?
Instead of the ifconfig "RX dropped" counter getting incremented, the above reproduction steps shows the 'xdp:xdp_redirect_map_err' counter getting incremented. Under which circumstances does this counter get incremented? I always assumed it was when there is no UMEM frame for XDP to copy into? But I'm guessing there must be other circumstances too?
And also, if the "driver has not gotten enough buffers" then why would this result in the 'xdp:xdp_redirect_map_err' counter getting incremented? And wouldn't the driver itself have its own counter to increment? And if that counter is not the ifconfig "RX dropped" counter, then where can the counter be found?
@chaudron Thanks very much for commenting!
$ cat ../libbpf/src/xsk.h | egrep --context 3 DEFAULT_NUM_DESCS
LIBBPF_API int xsk_umem__fd(const struct xsk_umem *umem);
LIBBPF_API int xsk_socket__fd(const struct xsk_socket *xsk);
#define XSK_RING_CONS__DEFAULT_NUM_DESCS 2048
#define XSK_RING_PROD__DEFAULT_NUM_DESCS 2048
#define XSK_UMEM__DEFAULT_FRAME_SHIFT 12 /* 4096 bytes */
#define XSK_UMEM__DEFAULT_FRAME_SIZE (1 << XSK_UMEM__DEFAULT_FRAME_SHIFT)
#define XSK_UMEM__DEFAULT_FRAME_HEADROOM 0
So I found two *___DEFAULT_NUM_DESCS values in xsk.h in libbpf. Is the only way to change them to hack the hard-coded value directly in libbpf in xsk.h ? Or can I just ignore those values and use different values in af_xdp_user.c ? And do you know what the max value would be?
I tried doubling the value from 2,048 to 4,096 and interesting the xdp:xdp_redirect_map_err then stayed at zero.
BUT... I noticed that there seemed to be two problems in the reproduction steps above. 1. The non-zero xdp:xdp_redirect_map_err counts. And, 2. The non-zero xdp:xdp_redirect_map_err do not add up to the total missing packets.
The math: ifconfig says that 4,000,006 packets were received. af_xdp_user says that 3,999,279 packets were received. Which is 727 missing packets. However, perf xdp:xdp_redirect_map_err missing packets only account for 219 + 139 = 358 packets. How to account for the remaining 727 - 358 = 369 packets? Which counter might be counter those? @magnus-karlsson ?
Likewise, if I -- as per your suggestion -- compile and run with the 4,096 sized *___DEFAULT_NUM_DESCS values in xsk.h in libbpf, then I get the following output below. ifconfig says it received 4,000,007 packets but af_xdp_user says it only received 3,999,639 packets. xdp:xdp_redirect_map_err shows no non-zero values, but still 368 packets are missing...
$ make clean ; make && time sudo sh -s << 'EOF' 2>&1
sudo ../testenv/testenv.sh teardown --name veth-advanced03 ; sudo ../testenv/testenv.sh setup --legacy-ip --name veth-advanced03
timeout --signal=SIGINT 7 taskset -c 1 perf stat --interval-print 1000 --all-cpus --event='xdp:xdp_redirect_map*' ./af_xdp_user --dev veth-advanced03 --filename af_xdp_kern.o --skb-mode --force & PIDS=$!
ip netns exec veth-advanced03 sh -c "sleep 1 ; \
taskset -c 2 tcpreplay -i veth0 --topspeed --stats=10 --loop 0 --limit 2000000 --preload-pcap /home/simon/web-browser-toolbars.trace & \
taskset -c 3 tcpreplay -i veth0 --topspeed --stats=10 --loop 0 --limit 2000000 --preload-pcap /home/simon/web-browser-toolbars.trace & \
"
wait $PIDS ; echo "- af_xdp_user process ended"
ifconfig veth-advanced03 ; ethtool -S veth-advanced03
EOF
Tearing down environment 'veth-advanced03'
Setting up new environment 'veth-advanced03'
...
# time counts unit events
1.001704336 1 xdp:xdp_redirect_map
1.001704336 0 xdp:xdp_redirect_map_err
File Cache is enabled
File Cache is enabled
Test start: 2020-03-30 12:45:41.372698 ...
Test start: 2020-03-30 12:45:41.404251 ...
AF_XDP RX: 35788 pkts ( 35757 pps) 33847 Kbytes ( 271 Mbits/s) period:1.000868
2.004805352 1,212,234 xdp:xdp_redirect_map
2.004805352 0 xdp:xdp_redirect_map_err
AF_XDP RX: 1379527 pkts ( 1343038 pps) 1288472 Kbytes ( 10032 Mbits/s) period:1.000522
3.007149079 1,350,473 xdp:xdp_redirect_map
3.007149079 0 xdp:xdp_redirect_map_err
AF_XDP RX: 2721044 pkts ( 1341102 pps) 2540617 Kbytes ( 10014 Mbits/s) period:1.000309
4.009439933 1,305,987 xdp:xdp_redirect_map
4.009439933 0 xdp:xdp_redirect_map_err
Test complete: 2020-03-30 12:45:44.391110
Actual: 2000000 packets (1867492361 bytes) sent in 3.01 seconds
Rated: 618784745.1 Bps, 4950.27 Mbps, 662690.52 pps
Statistics for network device: veth0
Successful packets: 2000000
Failed packets: 0
Truncated packets: 0
Retried packets (ENOBUFS): 0
Retried packets (EAGAIN): 0
Test complete: 2020-03-30 12:45:44.433027
Actual: 2000000 packets (1867492361 bytes) sent in 3.02 seconds
Rated: 616741202.4 Bps, 4933.92 Mbps, 660501.98 pps
Statistics for network device: veth0
Successful packets: 2000000
Failed packets: 0
Truncated packets: 0
Retried packets (ENOBUFS): 0
Retried packets (EAGAIN): 0
AF_XDP RX: 3999639 pkts ( 1278177 pps) 3734831 Kbytes ( 9551 Mbits/s) period:1.000327
5.011754843 130,944 xdp:xdp_redirect_map
5.011754843 0 xdp:xdp_redirect_map_err
AF_XDP RX: 3999639 pkts ( 0 pps) 3734831 Kbytes ( 0 Mbits/s) period:1.000329
6.014121650 0 xdp:xdp_redirect_map
6.014121650 0 xdp:xdp_redirect_map_err
AF_XDP RX: 3999639 pkts ( 0 pps) 3734831 Kbytes ( 0 Mbits/s) period:1.000322
6.956703411 0 xdp:xdp_redirect_map
6.956703411 0 xdp:xdp_redirect_map_err
INFO: xdp_link_detach() removed XDP prog ID:591 on ifindex:271
6.993699159 0 xdp:xdp_redirect_map
6.993699159 0 xdp:xdp_redirect_map_err
- af_xdp_user process ended
veth-advanced03: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.11.1.1 netmask 255.255.255.0 broadcast 0.0.0.0
inet6 fc00:dead:cafe:1::1 prefixlen 64 scopeid 0x0<global>
inet6 fe80::c0b3:c5ff:fe40:a3a8 prefixlen 64 scopeid 0x20<link>
ether c2:b3:c5:40:a3:a8 txqueuelen 1000 (Ethernet)
RX packets 4000007 bytes 3734985456 (3.7 GB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 31 bytes 5468 (5.4 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
NIC statistics:
peer_ifindex: 270
rx_queue_0_xdp_packets: 0
rx_queue_0_xdp_bytes: 0
rx_queue_0_xdp_drops: 0
real 0m7.396s
On Mon, Mar 30, 2020 at 10:01 PM Simon Hardy-Francis < notifications@github.com> wrote:
@chaudron https://github.com/chaudron Thanks very much for commenting!
$ cat ../libbpf/src/xsk.h | egrep --context 3 DEFAULT_NUM_DESCS LIBBPF_API int xsk_umem__fd(const struct xsk_umem umem); LIBBPF_API int xsk_socket__fd(const struct xsk_socket xsk);
define XSK_RING_CONS__DEFAULT_NUM_DESCS 2048
define XSK_RING_PROD__DEFAULT_NUM_DESCS 2048
define XSK_UMEM__DEFAULT_FRAME_SHIFT 12 / 4096 bytes /
define XSK_UMEM__DEFAULT_FRAME_SIZE (1 << XSK_UMEM__DEFAULT_FRAME_SHIFT)
define XSK_UMEM__DEFAULT_FRAME_HEADROOM 0
So I found two *___DEFAULT_NUM_DESCS values in xsk.h in libbpf. Is the only way to change them to hack the hard-coded value directly in libbpf in xsk.h ? Or can I just ignore those values and use different values in af_xdp_user.c ? And do you know what the max value would be?
No reason to hack the xsk.h file. Fill in the config struct that can be supplied with the __create() calls. The size has to be a power of 2 and the max value is large and architecture dependent.
I tried doubling the value from 2,048 to 4,096 and interesting the xdp:xdp_redirect_map_err then stayed at zero.
Perfect!
BUT... I noticed that there seemed to be two problems in the reproduction steps above. 1. The non-zero xdp:xdp_redirect_map_err counts. And, 2. The non-zero xdp:xdp_redirect_map_err do not add up to the total missing packets.
The math: ifconfig says that 4,000,006 packets were received. af_xdp_user says that 3,999,279 packets were received. Which is 727 missing packets. However, perf xdp:xdp_redirect_map_err missing packets only account for 219
- 139 = 358 packets. How to account for the remaining 727 - 358 = 369 packets? Which counter might be counter those? @magnus-karlsson https://github.com/magnus-karlsson ?
Good question, but unfortunately I do not know the answer. Can veth drop it and account for this in some way? In any way,
I will add a new work item in my backlog to expose some generic stat (using the existing getsockopt) around missed allocations due to an empty fill ring. If we had that, you would have seen the problem right away. It is clearly missing.
/Magnus
Likewise, if I -- as per your suggestion -- compile and run with the 4,096 sized *___DEFAULT_NUM_DESCS values in xsk.h in libbpf, then I get the following output below. ifconfig says it received 4,000,007 packets but af_xdp_user says it only received 3,999,639 packets. xdp:xdp_redirect_map_err shows no non-zero values, but still 368 packets are missing...
$ make clean ; make && time sudo sh -s << 'EOF' 2>&1 sudo ../testenv/testenv.sh teardown --name veth-advanced03 ; sudo ../testenv/testenv.sh setup --legacy-ip --name veth-advanced03 timeout --signal=SIGINT 7 taskset -c 1 perf stat --interval-print 1000 --all-cpus --event='xdp:xdp_redirect_map*' ./af_xdp_user --dev veth-advanced03 --filename af_xdp_kern.o --skb-mode --force & PIDS=$! ip netns exec veth-advanced03 sh -c "sleep 1 ; \ taskset -c 2 tcpreplay -i veth0 --topspeed --stats=10 --loop 0 --limit 2000000 --preload-pcap /home/simon/web-browser-toolbars.trace & \ taskset -c 3 tcpreplay -i veth0 --topspeed --stats=10 --loop 0 --limit 2000000 --preload-pcap /home/simon/web-browser-toolbars.trace & \ " wait $PIDS ; echo "- af_xdp_user process ended" ifconfig veth-advanced03 ; ethtool -S veth-advanced03 EOF Tearing down environment 'veth-advanced03' Setting up new environment 'veth-advanced03' ...
time counts unit events
1.001704336 1 xdp:xdp_redirect_map 1.001704336 0 xdp:xdp_redirect_map_err
File Cache is enabled File Cache is enabled Test start: 2020-03-30 12:45:41.372698 ... Test start: 2020-03-30 12:45:41.404251 ... AF_XDP RX: 35788 pkts ( 35757 pps) 33847 Kbytes ( 271 Mbits/s) period:1.000868 2.004805352 1,212,234 xdp:xdp_redirect_map 2.004805352 0 xdp:xdp_redirect_map_err AF_XDP RX: 1379527 pkts ( 1343038 pps) 1288472 Kbytes ( 10032 Mbits/s) period:1.000522 3.007149079 1,350,473 xdp:xdp_redirect_map 3.007149079 0 xdp:xdp_redirect_map_err AF_XDP RX: 2721044 pkts ( 1341102 pps) 2540617 Kbytes ( 10014 Mbits/s) period:1.000309 4.009439933 1,305,987 xdp:xdp_redirect_map 4.009439933 0 xdp:xdp_redirect_map_err Test complete: 2020-03-30 12:45:44.391110 Actual: 2000000 packets (1867492361 bytes) sent in 3.01 seconds Rated: 618784745.1 Bps, 4950.27 Mbps, 662690.52 pps Statistics for network device: veth0 Successful packets: 2000000 Failed packets: 0 Truncated packets: 0 Retried packets (ENOBUFS): 0 Retried packets (EAGAIN): 0 Test complete: 2020-03-30 12:45:44.433027 Actual: 2000000 packets (1867492361 bytes) sent in 3.02 seconds Rated: 616741202.4 Bps, 4933.92 Mbps, 660501.98 pps Statistics for network device: veth0 Successful packets: 2000000 Failed packets: 0 Truncated packets: 0 Retried packets (ENOBUFS): 0 Retried packets (EAGAIN): 0 AF_XDP RX: 3999639 pkts ( 1278177 pps) 3734831 Kbytes ( 9551 Mbits/s) period:1.000327 5.011754843 130,944 xdp:xdp_redirect_map 5.011754843 0 xdp:xdp_redirect_map_err AF_XDP RX: 3999639 pkts ( 0 pps) 3734831 Kbytes ( 0 Mbits/s) period:1.000329 6.014121650 0 xdp:xdp_redirect_map 6.014121650 0 xdp:xdp_redirect_map_err AF_XDP RX: 3999639 pkts ( 0 pps) 3734831 Kbytes ( 0 Mbits/s) period:1.000322 6.956703411 0 xdp:xdp_redirect_map 6.956703411 0 xdp:xdp_redirect_map_err INFO: xdp_link_detach() removed XDP prog ID:591 on ifindex:271 6.993699159 0 xdp:xdp_redirect_map 6.993699159 0 xdp:xdp_redirect_map_err
- af_xdp_user process ended veth-advanced03: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 10.11.1.1 netmask 255.255.255.0 broadcast 0.0.0.0 inet6 fc00:dead:cafe:1::1 prefixlen 64 scopeid 0x0
inet6 fe80::c0b3:c5ff:fe40:a3a8 prefixlen 64 scopeid 0x20 ether c2:b3:c5:40:a3:a8 txqueuelen 1000 (Ethernet) RX packets 4000007 bytes 3734985456 (3.7 GB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 31 bytes 5468 (5.4 KB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 NIC statistics: peer_ifindex: 270 rx_queue_0_xdp_packets: 0 rx_queue_0_xdp_bytes: 0 rx_queue_0_xdp_drops: 0
real 0m7.396s
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/xdp-project/xdp-tutorial/issues/116#issuecomment-606214689, or unsubscribe https://github.com/notifications/unsubscribe-auth/AASGUEMAH3TI7YXJUTORVXTRKD3C7ANCNFSM4LVKRTGQ .
I wrote the original stackoverflow-question and also this kernel-mail question: https://marc.info/?l=xdp-newbies&m=158472686211835&w=2
This is my xdp_create_socket
-function:
struct xsk_socket_config xsk_socket_cfg;
struct xsk_socket_info *xsk;
struct xsk_ring_cons *rxr;
struct xsk_ring_prod *txr;
int ret;
xsk = calloc(1, sizeof(*xsk));
if (!xsk) {
fprintf(stderr, "xsk `calloc` failed: %s\n", strerror(errno));
exit(1);
}
xsk->umem = umem;
xsk_socket_cfg.rx_size = XSK_CONS_AMOUNT;
xsk_socket_cfg.tx_size = XSK_PROD_AMOUNT;
xsk_socket_cfg.libbpf_flags = XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD;
xsk_socket_cfg.xdp_flags = cfg->xdp_flags | XDP_FLAGS_DRV_MODE;
xsk_socket_cfg.bind_flags = cfg->xsk_bind_flags | XDP_USE_NEED_WAKEUP | XDP_ZEROCOPY;
rxr = rx ? &xsk->rx : NULL;
txr = tx ? &xsk->tx : NULL;
ret = xsk_socket__create(&xsk->xsk, cfg->ifname_buf, cfg->xsk_if_queue, umem->umem, rxr, txr, &xsk_socket_cfg);
if (ret) {
fprintf(stderr, "`xsk_socket__create` returned error: %s\n", strerror(errno));
exit(-ret);
}
return xsk;
and xsk_configure_umem
:
struct xsk_umem_info *umem;
struct xsk_umem_config cfg = {
.fill_size = XSK_PROD_AMOUNT,
.comp_size = XSK_CONS_AMOUNT,
.frame_size = FRAME_SIZE,
.frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM,
.flags = 0
};
umem = calloc(1, sizeof(*umem));
if (!umem) {
fprintf(stderr, "Error while allocating umem: %s\n", strerror(errno));
exit(1);
}
const int ret = xsk_umem__create(&umem->umem, buffer, size, &umem->fq, &umem->cq, &cfg);
if (ret) {
fprintf(stderr, "`xsk_umem__create` returned: %s\n", strerror(-ret));
exit(1);
}
umem->buffer = buffer;
return umem;
With
#define XSK_CONS_AMOUNT 32768
#define XSK_PROD_AMOUNT 32768
In my kernel-mail question, I observed that packets smaller than 400 bytes (372 bytes payload) are not received by the AF-XDP kernel program. I also used nping
wit a rate of 250 packets per second. I am sorry, but how would 250 packets per second be a problem?
Is someone able to reproduce my example using a very basic XDP kernel program which just prints out the length of the received packet and nping
:
nping --udp -p <port> --dest-ip <ip> --data-length 371 -c 50000000 --rate 250 -N
to ping the interface the XDP program is loaded on with UDP packets?
"Is someone able to reproduce my example using a very basic XDP kernel program"
@Bresenham: Using the same steps as detailed above (i.e. the standard tutorial XDP kernel code which passes all packets to AF_XDP, and the af_xdp_user example which simply reports how many packets were read per second) and using a modified version of your nping line which works with the veth interface (I had to use the options --dest-mac
and --dest-eth
otherwise nping says the packets are undeliverable because the test runs everything locally via veth...), then I could reproduce using the one-liner below.
Your nping line said count 50,000,000 at 250 per second... which is about 55 hours which is too long for me... so I reduced the count to 2,500 which runs it for a few seconds. You can play around with that on your own system...
In the example below then nping says it sent 2,500 packets. tcpdump says it received 2,500 nping packets. ifconfig says it received 2,510 packets. And af_xdp_user says it received 2,504 packets. Obviously there's a few extra packets floating around. But it doesn't look like many 100s of packets are going missing like in my example above in this issue.
I did notice that although the nping rate is set to 250 per second, both perf xdp and af_xdp_user only report about 210-ish per second. I've never used nping before, and although it has an impressive number of options, this rate discrepancy makes me wonder about the quality of the nping code. But maybe it's designed for higher PPS?
I also tried running nping at a higher PPS of 1M but it only managed to send about 180k PPS. So I guess it's not designed for higher PPS either. Sending at the max rate for nping also didn't result in packet loss. As a comparison, on my laptop, sending with tcpreplay and the preload option, then I get about 800k PPS... which I thought was on the slow side... so 180k PPS for nping seems really slow...
Sorry I couldn't reproduce your packet loss described. But on the plus side, you could now compare your code with the tutorial code working with nping in order to figure out where the packet loss issue is? Or you could test our your code on a veth and see if the packet loss is somehow related to testing on a real NIC? Hope this helps!
$ make && time sudo sh -s << 'EOF' 2>&1
../testenv/testenv.sh teardown --name veth-advanced03 ; ../testenv/testenv.sh setup --legacy-ip --name veth-advanced03
timeout --signal=SIGINT 20 taskset -c 1 perf stat --interval-print 1000 --all-cpus --event='xdp:xdp_redirect_map*' ./af_xdp_user --dev veth-advanced03 --filename af_xdp_kern.o --skb-mode --force & PIDS=$!
ip netns exec veth-advanced03 sh -c " \
tcpdump -i veth0 -w veth-advanced03.pcap & \
sleep 1 ; \
taskset -c 2 nping -v-1 --udp --dest-port 12345 --dest-ip 1.2.3.4 --dest-mac 00:01:02:03:04:05 --send-eth --data-length 371 --count 2500 --rate 250 --no-capture & \
"
wait $PIDS ; echo "- af_xdp_user process ended"
sleep 1 ; pkill tcpdump ; sleep 1 ; echo "- tcpdump captured packets from nping" ; tcpdump -n -r veth-advanced03.pcap | egrep "1.2.3.4.12345" | wc -l
ifconfig veth-advanced03 ; ethtool -S veth-advanced03
EOF
Tearing down environment 'veth-advanced03'
Setting up new environment 'veth-advanced03'
...
tcpdump: listening on veth0, link-type EN10MB (Ethernet), capture size 262144 bytes
Starting Nping 0.7.80 ( https://nmap.org/nping ) at 2020-03-31 16:28 PDT
# time counts unit events
1.001499226 1 xdp:xdp_redirect_map
1.001499226 0 xdp:xdp_redirect_map_err
AF_XDP RX: 3 pkts ( 3 pps) 0 Kbytes ( 0 Mbits/s) period:1.000325
2.008877312 40 xdp:xdp_redirect_map
2.008877312 0 xdp:xdp_redirect_map_err
AF_XDP RX: 67 pkts ( 64 pps) 26 Kbytes ( 0 Mbits/s) period:1.000353
3.012104284 211 xdp:xdp_redirect_map
3.012104284 0 xdp:xdp_redirect_map_err
AF_XDP RX: 277 pkts ( 210 pps) 112 Kbytes ( 1 Mbits/s) period:1.000289
4.015470826 209 xdp:xdp_redirect_map
4.015470826 0 xdp:xdp_redirect_map_err
AF_XDP RX: 486 pkts ( 209 pps) 199 Kbytes ( 1 Mbits/s) period:1.000324
5.017740755 212 xdp:xdp_redirect_map
5.017740755 0 xdp:xdp_redirect_map_err
AF_XDP RX: 697 pkts ( 211 pps) 286 Kbytes ( 1 Mbits/s) period:1.000339
6.020110484 212 xdp:xdp_redirect_map
6.020110484 0 xdp:xdp_redirect_map_err
AF_XDP RX: 910 pkts ( 213 pps) 374 Kbytes ( 1 Mbits/s) period:1.000356
7.022473657 211 xdp:xdp_redirect_map
7.022473657 0 xdp:xdp_redirect_map_err
AF_XDP RX: 1120 pkts ( 210 pps) 461 Kbytes ( 1 Mbits/s) period:1.000326
8.024771194 212 xdp:xdp_redirect_map
8.024771194 0 xdp:xdp_redirect_map_err
AF_XDP RX: 1331 pkts ( 211 pps) 548 Kbytes ( 1 Mbits/s) period:1.000332
9.027071800 210 xdp:xdp_redirect_map
9.027071800 0 xdp:xdp_redirect_map_err
AF_XDP RX: 1540 pkts ( 209 pps) 634 Kbytes ( 1 Mbits/s) period:1.000310
10.029381957 210 xdp:xdp_redirect_map
10.029381957 0 xdp:xdp_redirect_map_err
AF_XDP RX: 1750 pkts ( 210 pps) 721 Kbytes ( 1 Mbits/s) period:1.000298
11.031720519 211 xdp:xdp_redirect_map
11.031720519 0 xdp:xdp_redirect_map_err
AF_XDP RX: 1960 pkts ( 210 pps) 808 Kbytes ( 1 Mbits/s) period:1.000340
12.034996404 210 xdp:xdp_redirect_map
12.034996404 0 xdp:xdp_redirect_map_err
AF_XDP RX: 2171 pkts ( 211 pps) 895 Kbytes ( 1 Mbits/s) period:1.000390
13.037346539 210 xdp:xdp_redirect_map
13.037346539 0 xdp:xdp_redirect_map_err
AF_XDP RX: 2380 pkts ( 209 pps) 981 Kbytes ( 1 Mbits/s) period:1.000293
Max rtt: N/A | Min rtt: N/A | Avg rtt: N/A
Raw packets sent: 2500 (1.032MB) | Rcvd: 0 (0B) | Lost: 2500 (100.00%)
Nping done: 1 IP address pinged in 12.76 seconds
14.039874547 145 xdp:xdp_redirect_map
14.039874547 0 xdp:xdp_redirect_map_err
AF_XDP RX: 2504 pkts ( 124 pps) 1032 Kbytes ( 0 Mbits/s) period:1.000520
15.042196027 0 xdp:xdp_redirect_map
15.042196027 0 xdp:xdp_redirect_map_err
AF_XDP RX: 2504 pkts ( 0 pps) 1032 Kbytes ( 0 Mbits/s) period:1.000324
16.044502744 0 xdp:xdp_redirect_map
16.044502744 0 xdp:xdp_redirect_map_err
AF_XDP RX: 2504 pkts ( 0 pps) 1032 Kbytes ( 0 Mbits/s) period:1.000327
17.047826959 0 xdp:xdp_redirect_map
17.047826959 0 xdp:xdp_redirect_map_err
AF_XDP RX: 2504 pkts ( 0 pps) 1032 Kbytes ( 0 Mbits/s) period:1.000328
18.050143899 0 xdp:xdp_redirect_map
18.050143899 0 xdp:xdp_redirect_map_err
AF_XDP RX: 2504 pkts ( 0 pps) 1032 Kbytes ( 0 Mbits/s) period:1.000283
19.052431510 0 xdp:xdp_redirect_map
19.052431510 0 xdp:xdp_redirect_map_err
AF_XDP RX: 2504 pkts ( 0 pps) 1032 Kbytes ( 0 Mbits/s) period:1.000371
19.954678525 0 xdp:xdp_redirect_map
19.954678525 0 xdp:xdp_redirect_map_err
INFO: xdp_link_detach() removed XDP prog ID:799 on ifindex:359
19.986299157 0 xdp:xdp_redirect_map
19.986299157 0 xdp:xdp_redirect_map_err
- af_xdp_user process ended
2526 packets captured
2526 packets received by filter
0 packets dropped by kernel
- tcpdump captured packets from nping
reading from file veth-advanced03.pcap, link-type EN10MB (Ethernet)
2500
veth-advanced03: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.11.1.1 netmask 255.255.255.0 broadcast 0.0.0.0
inet6 fc00:dead:cafe:1::1 prefixlen 64 scopeid 0x0<global>
inet6 fe80::d414:ff:fe96:db2b prefixlen 64 scopeid 0x20<link>
ether d6:14:00:96:db:2b txqueuelen 1000 (Ethernet)
RX packets 2510 bytes 1033360 (1.0 MB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 33 bytes 5854 (5.8 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
NIC statistics:
peer_ifindex: 358
rx_queue_0_xdp_packets: 0
rx_queue_0_xdp_bytes: 0
rx_queue_0_xdp_drops: 0
real 0m22.409s
@magnus-karlsson thanks for the comments!
Good question, but unfortunately I do not know the answer. Can veth drop it and account for this in some way? In any way,
I managed to simplify the packet loss example using veth down to sending just 8,000 packets.
In the example below 8,000 packets are sent but af_xdp_user only says it received 7,644 packets.
I increased the veth txqueuelen to 10,000 from the default 1,000... but not sure if it's actually used or makes a difference.
Both ifconfig and tcpdump see the 8,000 packets from tcpreplay.
I have searched high and low for tracepoints for veth which might show packet loss but cannot find anything.
It's interesting that the amount of packets lost seems to be similar whether sending 8,000 or 4M packets... so does that indicate that there might be some kind of initialization bug where the packet loss only happens very early on?
If I reduce the number of packets further then the packet loss drops down from 350-ish to 200-ish...
@magnus-karlsson Any ideas what else I can try? How would you debug this further?
$ make && time sudo sh -s << 'EOF' 2>&1
../testenv/testenv.sh teardown --name veth-advanced03 ; ../testenv/testenv.sh setup --legacy-ip --name veth-advanced03
ip link set veth-advanced03 txqueuelen 10000
timeout --signal=SIGINT 3 taskset -c 1 perf stat --interval-print 500 --all-cpus --event='xdp:xdp_redirect_map*' ./af_xdp_user --dev veth-advanced03 --filename af_xdp_kern.o --skb-mode --force & PIDS=$!
ip netns exec veth-advanced03 sh -c "
tcpdump -i veth0 -w veth-advanced03.pcap & \
sleep 1 ; \
taskset -c 2 tcpreplay -i veth0 --topspeed --stats=10 --loop 0 --limit 4000 --preload-pcap /home/simon/web-browser-toolbars.trace & \
taskset -c 3 tcpreplay -i veth0 --topspeed --stats=10 --loop 0 --limit 4000 --preload-pcap /home/simon/web-browser-toolbars.trace & \
"
wait $PIDS ; echo "- af_xdp_user process ended"
ifconfig veth-advanced03 ; ethtool -S veth-advanced03
sleep 1 ; pkill tcpdump ; sleep 1 ; echo "- tcpdump captured packets" ; tcpdump -n -r veth-advanced03.pcap | wc -l
EOF
Tearing down environment 'veth-advanced03'
Setting up new environment 'veth-advanced03'
...
tcpdump: listening on veth0, link-type EN10MB (Ethernet), capture size 262144 bytes
# time counts unit events
0.502066753 0 xdp:xdp_redirect_map
0.502066753 0 xdp:xdp_redirect_map_err
1.004742302 0 xdp:xdp_redirect_map
1.004742302 0 xdp:xdp_redirect_map_err
File Cache is enabled
File Cache is enabled
Test start: 2020-03-31 18:19:42.551781 ...
Test start: 2020-03-31 18:19:42.559246 ...
Test complete: 2020-03-31 18:19:42.562535
Actual: 4000 packets (3675306 bytes) sent in 0.010754 seconds
Rated: 341761763.0 Bps, 2734.09 Mbps, 371954.62 pps
Statistics for network device: veth0
Successful packets: 4000
Failed packets: 0
Truncated packets: 0
Retried packets (ENOBUFS): 0
Retried packets (EAGAIN): 0
Test complete: 2020-03-31 18:19:42.568619
Actual: 4000 packets (3675306 bytes) sent in 0.009373 seconds
Rated: 392116291.4 Bps, 3136.93 Mbps, 426757.70 pps
Statistics for network device: veth0
Successful packets: 4000
Failed packets: 0
Truncated packets: 0
Retried packets (ENOBUFS): 0
Retried packets (EAGAIN): 0
AF_XDP RX: 7644 pkts ( 7639 pps) 7199 Kbytes ( 58 Mbits/s) period:1.000631
1.507363877 7,644 xdp:xdp_redirect_map
1.507363877 0 xdp:xdp_redirect_map_err
2.009021565 0 xdp:xdp_redirect_map
2.009021565 0 xdp:xdp_redirect_map_err
AF_XDP RX: 7644 pkts ( 0 pps) 7199 Kbytes ( 0 Mbits/s) period:1.000248
2.511711925 0 xdp:xdp_redirect_map
2.511711925 0 xdp:xdp_redirect_map_err
2.953628049 0 xdp:xdp_redirect_map
2.953628049 0 xdp:xdp_redirect_map_err
INFO: xdp_link_detach() removed XDP prog ID:959 on ifindex:439
2.987439874 0 xdp:xdp_redirect_map
2.987439874 0 xdp:xdp_redirect_map_err
- af_xdp_user process ended
veth-advanced03: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.11.1.1 netmask 255.255.255.0 broadcast 0.0.0.0
inet6 fc00:dead:cafe:1::1 prefixlen 64 scopeid 0x0<global>
inet6 fe80::6015:9fff:fe78:af23 prefixlen 64 scopeid 0x20<link>
ether 62:15:9f:78:af:23 txqueuelen 10000 (Ethernet)
RX packets 8006 bytes 7351256 (7.3 MB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 27 bytes 4654 (4.6 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
NIC statistics:
peer_ifindex: 438
rx_queue_0_xdp_packets: 0
rx_queue_0_xdp_bytes: 0
rx_queue_0_xdp_drops: 0
8019 packets captured
8019 packets received by filter
0 packets dropped by kernel
- tcpdump captured packets
reading from file veth-advanced03.pcap, link-type EN10MB (Ethernet)
8019
real 0m5.410s
Thank you so much simon. I tracked down the problem to my packet-parsing. Can someone just take a quick glimpse and tell me what is wrong with this parsing?
SEC("xdp_sock")
int xdp_sock_prog(struct xdp_md *ctx) {
const void *data = (void*)(unsigned long)ctx->data;
const void *data_end = (void*)(unsigned long)ctx->data_end;
const struct ethhdr *eth = (struct ethhdr*)(data);
const int len = data_end - data;
bpf_printk("STAGE 0 LEN: %d\n", len);
if(eth + sizeof(struct ethhdr) <= data_end) {
bpf_printk("\t->STAGE 1\n");
if(bpf_ntohs(eth->h_proto) == ETH_P_IP) {
const struct iphdr *iph = (struct iphdr*)(data + sizeof(struct ethhdr));
const int remaining = data_end - (void*)(iph + sizeof(struct iphdr));
bpf_printk("\t->STAGE 2 LEN IPHDR: %d\n", remaining);
if(iph + sizeof(struct iphdr) <= data_end) {
bpf_printk("\t->STAGE 3\n");
Because for nping --data-length 371
, the variable remaining
drops to -1
which leads to the following if
to be false and therefore not passing the packet to user-space? How can that be though? Not enough space for IP-Header?
Maximilian Gaul notifications@github.com writes:
Thank you so much simon. I tracked down the problem to my packet-parsing. Can someone just take a quick glimpse and tell me what is wrong with this parsing?
SEC("xdp_sock") int xdp_sock_prog(struct xdp_md *ctx) {
const void *data = (void*)(unsigned long)ctx->data; const void *data_end = (void*)(unsigned long)ctx->data_end; const struct ethhdr *eth = (struct ethhdr*)(data); const int len = data_end - data; bpf_printk("STAGE 0 LEN: %d\n", len); if(eth + sizeof(struct ethhdr) <= data_end) { bpf_printk("\t->STAGE 1\n"); if(bpf_ntohs(eth->h_proto) == ETH_P_IP) { const struct iphdr *iph = (struct iphdr*)(data + sizeof(struct ethhdr)); const int remaining = data_end - (void*)(iph + sizeof(struct iphdr)); bpf_printk("\t->STAGE 2 LEN IPHDR: %d\n", remaining); if(iph + sizeof(struct iphdr) <= data_end) { bpf_printk("\t->STAGE 3\n");
Because for
nping --data-length 371
, the variableremaining
drops to-1
? How can that be?
I think it's this line:
const int remaining = data_end - (void*)(iph + sizeof(struct iphdr));
The cast to void is outside the parenthesis, so you're doing pointer arithmetic with an 'iph'-sized pointer. I.e., you'll be advancing it 400 bytes (20 * 20) instead of 20.
So change that to either
const int remaining = data_end - ((void *)iph + sizeof(struct iphdr));
or just
const int remaining = data_end - (void*)(iph + 1);
Oh my god! Thank you!
@magnus-karlsson I tried out some more things:
Even concurrently tcpreplaying the first 150 packets of two pcap files provokes the bug. Please see below.
I also used unique pcap files for each tcpreplay session in case using the same pcap file caused some kind of pathological condition. But whether the pcap files are different or the same, I get the same results.
In the example below, 2x 150 packets are tcpreplayed but af_xdp_user only sees 180 of them, meaning that 300 - 180 = 120 packets go missing between veth and XDP. But ifconfig and tcpdump see all of them in the veth namespace.
In addition, after tcpreplaying the two sets of 150 packets, 100k packets from a single tcpreplay is injected, and then an additional two sets of 150 packets. So 100,600 packets are expected in total, and that's what both ifconfig and tcpdump report. However, af_xdp_user always fails to see 100s of packets whenever concurrent tcpreplay sessions are started, regardless of how many packets have been seen before.
Since the total packets to cause this are less than 300, it seems like this must be some type of concurrency bug, or? Looking forward to your thoughts on how to debug this further.
$ make && time sudo sh -s << 'EOF' 2>&1
../testenv/testenv.sh teardown --name veth-advanced03 ; ../testenv/testenv.sh setup --legacy-ip --name veth-advanced03
ip link set veth-advanced03 txqueuelen 10000
timeout --signal=SIGINT 8 taskset -c 1 perf stat --interval-print 500 --all-cpus --event='xdp:xdp_redirect_map*' ./af_xdp_user --dev veth-advanced03 --filename af_xdp_kern.o --skb-mode --force & PIDS=$!
ip netns exec veth-advanced03 sh -c "
taskset -c 2 tcpdump -i veth0 -w veth-advanced03.pcap & \
sleep 1 ; \
taskset -c 3 tcpreplay -i veth0 --topspeed --stats=10 --loop 0 --limit 150 --preload-pcap ./test.trace1 & \
taskset -c 4 tcpreplay -i veth0 --topspeed --stats=10 --loop 0 --limit 150 --preload-pcap ./test.trace2 & \
sleep 2 ; \
taskset -c 3 tcpreplay -i veth0 --topspeed --stats=10 --loop 0 --limit 100000 --preload-pcap ./test.trace3 & \
sleep 2 ; \
taskset -c 3 tcpreplay -i veth0 --topspeed --stats=10 --loop 0 --limit 150 --preload-pcap ./test.trace4 & \
taskset -c 4 tcpreplay -i veth0 --topspeed --stats=10 --loop 0 --limit 150 --preload-pcap ./test.trace5 & \
"
wait $PIDS ; echo "- af_xdp_user process ended"
ifconfig veth-advanced03 ; ethtool -S veth-advanced03
sleep 1 ; pkill tcpdump ; sleep 1 ; echo "- tcpdump captured packets" ; tcpdump -n -r veth-advanced03.pcap | wc -l
EOF
Tearing down environment 'veth-advanced03'
Setting up new environment 'veth-advanced03'
...
tcpdump: listening on veth0, link-type EN10MB (Ethernet), capture size 262144 bytes
- af_xdp_user entering while loop for read
# time counts unit events
0.501447814 1 xdp:xdp_redirect_map
0.501447814 0 xdp:xdp_redirect_map_err
1.005998560 0 xdp:xdp_redirect_map
1.005998560 0 xdp:xdp_redirect_map_err
File Cache is enabled
Test start: 2020-04-01 14:18:57.351094 ...
Test complete: 2020-04-01 14:18:57.351401
Actual: 150 packets (152056 bytes) sent in 0.000307 seconds
Rated: 495296416.9 Bps, 3962.37 Mbps, 488599.34 pps
Statistics for network device: veth0
Successful packets: 150
Failed packets: 0
Truncated packets: 0
Retried packets (ENOBUFS): 0
Retried packets (EAGAIN): 0
File Cache is enabled
Test start: 2020-04-01 14:18:57.375903 ...
Test complete: 2020-04-01 14:18:57.376475
Actual: 150 packets (158994 bytes) sent in 0.000572 seconds
Rated: 277961538.4 Bps, 2223.69 Mbps, 262237.76 pps
Statistics for network device: veth0
Successful packets: 150
Failed packets: 0
Truncated packets: 0
Retried packets (ENOBUFS): 0
Retried packets (EAGAIN): 0
AF_XDP RX: 180 pkts ( 180 pps) 189 Kbytes ( 2 Mbits/s) period:1.001102
1.510657935 181 xdp:xdp_redirect_map
1.510657935 0 xdp:xdp_redirect_map_err
2.013492422 0 xdp:xdp_redirect_map
2.013492422 0 xdp:xdp_redirect_map_err
AF_XDP RX: 182 pkts ( 2 pps) 190 Kbytes ( 0 Mbits/s) period:1.000410
2.517108882 0 xdp:xdp_redirect_map
2.517108882 0 xdp:xdp_redirect_map_err
3.019483118 0 xdp:xdp_redirect_map
3.019483118 0 xdp:xdp_redirect_map_err
File Cache is enabled
Test start: 2020-04-01 14:18:59.363262 ...
AF_XDP RX: 35420 pkts ( 35198 pps) 35881 Kbytes ( 285 Mbits/s) period:1.001123
Test complete: 2020-04-01 14:18:59.565788
Actual: 100000 packets (101274431 bytes) sent in 0.202526 seconds
Rated: 500056442.1 Bps, 4000.45 Mbps, 493763.76 pps
Statistics for network device: veth0
Successful packets: 100000
Failed packets: 0
Truncated packets: 0
Retried packets (ENOBUFS): 0
Retried packets (EAGAIN): 0
3.522168561 100,000 xdp:xdp_redirect_map
3.522168561 0 xdp:xdp_redirect_map_err
4.024844579 0 xdp:xdp_redirect_map
4.024844579 0 xdp:xdp_redirect_map_err
AF_XDP RX: 100182 pkts ( 64739 pps) 101464 Kbytes ( 524 Mbits/s) period:1.000357
4.528459970 0 xdp:xdp_redirect_map
4.528459970 0 xdp:xdp_redirect_map_err
5.031112673 0 xdp:xdp_redirect_map
5.031112673 0 xdp:xdp_redirect_map_err
File Cache is enabled
Test start: 2020-04-01 14:19:01.363636 ...
Test complete: 2020-04-01 14:19:01.363951
Actual: 150 packets (155390 bytes) sent in 0.000315 seconds
Rated: 493301587.3 Bps, 3946.41 Mbps, 476190.47 pps
Statistics for network device: veth0
Successful packets: 150
Failed packets: 0
Truncated packets: 0
Retried packets (ENOBUFS): 0
Retried packets (EAGAIN): 0
File Cache is enabled
Test start: 2020-04-01 14:19:01.385617 ...
Test complete: 2020-04-01 14:19:01.386261
Actual: 150 packets (152640 bytes) sent in 0.000644 seconds
Rated: 237018633.5 Bps, 1896.14 Mbps, 232919.25 pps
Statistics for network device: veth0
Successful packets: 150
Failed packets: 0
Truncated packets: 0
Retried packets (ENOBUFS): 0
Retried packets (EAGAIN): 0
AF_XDP RX: 100365 pkts ( 183 pps) 101650 Kbytes ( 1 Mbits/s) period:1.000334
5.533809128 183 xdp:xdp_redirect_map
5.533809128 0 xdp:xdp_redirect_map_err
6.036487028 0 xdp:xdp_redirect_map
6.036487028 0 xdp:xdp_redirect_map_err
AF_XDP RX: 100365 pkts ( 0 pps) 101650 Kbytes ( 0 Mbits/s) period:1.000322
6.539349738 0 xdp:xdp_redirect_map
6.539349738 0 xdp:xdp_redirect_map_err
7.041976355 0 xdp:xdp_redirect_map
7.041976355 0 xdp:xdp_redirect_map_err
AF_XDP RX: 100365 pkts ( 0 pps) 101650 Kbytes ( 0 Mbits/s) period:1.000498
7.544623892 0 xdp:xdp_redirect_map
7.544623892 0 xdp:xdp_redirect_map_err
7.953617169 0 xdp:xdp_redirect_map
7.953617169 0 xdp:xdp_redirect_map_err
INFO: xdp_link_detach() removed XDP prog ID:1239 on ifindex:611
7.995917840 0 xdp:xdp_redirect_map
7.995917840 0 xdp:xdp_redirect_map_err
- af_xdp_user process ended
veth-advanced03: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.11.1.1 netmask 255.255.255.0 broadcast 0.0.0.0
inet6 fc00:dead:cafe:1::1 prefixlen 64 scopeid 0x0<global>
inet6 fe80::90bd:54ff:fe8a:cc66 prefixlen 64 scopeid 0x20<link>
ether 92:bd:54:8a:cc:66 txqueuelen 10000 (Ethernet)
RX packets 100607 bytes 101894245 (101.8 MB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 31 bytes 5468 (5.4 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
NIC statistics:
peer_ifindex: 610
rx_queue_0_xdp_packets: 0
rx_queue_0_xdp_bytes: 0
rx_queue_0_xdp_drops: 0
100625 packets captured
100625 packets received by filter
0 packets dropped by kernel
- tcpdump captured packets
reading from file veth-advanced03.pcap, link-type EN10MB (Ethernet)
100625
real 0m10.567s
user 0m8.004s
sys 0m0.693s
I'm reaching out to @netoptimizer, @tmakita, and @borkmann for help with the above veth and/or XDP packet loss issue [1]. How did I come to list you three?
Well I'm running this Linux kernel:
$ uname -r
5.3.0-42-generic
I looked up the release here [2] and saw that the release date was 2019-Sep-15.
Then I looked up the last change to veth.c before that date here [3] and found your email addresses :-)
Signed-off-by: Toshiaki Makita <toshiaki.makita1@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
That version of veth.c [4] appears to support the 3 stats which I also see, which do not report any drops:
$ ethtool -S veth-advanced03
NIC statistics:
peer_ifindex: 610
rx_queue_0_xdp_packets: 0
rx_queue_0_xdp_bytes: 0
rx_queue_0_xdp_drops: 0
Looking forward to your comments and hope my reproduction steps help!
[1] https://github.com/xdp-project/xdp-tutorial/issues/116#issuecomment-607501798 [2] https://github.com/torvalds/linux/releases/tag/v5.3 [3] https://github.com/torvalds/linux/commit/9cda7807ee1e25a3771b5357d9fb12991b2550f9?diff=unified [4] https://github.com/torvalds/linux/blob/9cda7807ee1e25a3771b5357d9fb12991b2550f9/drivers/net/veth.c#L43
@simonhf Why not try xdp native mode.
After experiencing AF_XDP packet loss myself, and reading about others experiencing packet loss too [1], I tried to reproduce the packet loss with barely modified XDP tutorial code.
Below I reproduce the packet loss issue (on my box) using a PCAP file I have laying around. I modify the advanced03-AP-XDP tutorial code to redirect all packets to the AF_XDP user land code which is modified to only display the RX packets every second. Then I use tcpreplay to inject packets. Mysteriously the packet loss only seems to occur if I use two or more concurrent tcpreplay processes.
This snippet shows the following:
Each tcpreplay sends 2,000,000 packets each, and ifconfig confirms that 4,000,000 packets were sent.
af_xdp_user shows that only 3,999,279 packets were received.
perf shows that xdp:xdp_redirect_map_err experiences small but continual packet loss, i.e. no UMEM frames available.
Is this simply the case that the tutorial default of 4,096 frames is not enough?
I tried increasing the number of frames from 4,096 to 131,072... which times 4KB is 512 MB UMEM in total. But seems to make no difference to the packet loss. Any ideas?
[1] https://stackoverflow.com/questions/60694304/af-xdp-socket-vs-linux-sockets-why-does-my-af-xdp-socket-lose-packets-whereas-a