pcap_stats woefully under-reports ifdropped packets on Linux

gtedesco-r7 commented 4 months ago

Many drivers do not report anything into rx_missed_errors or rx_fifo_errors, see vmxnet for example.

At least one issue appears to be that linux_if_drops() does not report the value of rx_dropped, which indicates the number of packets not forwarded to the upper layers for processing.

The impact of this problem is that if I do a controlled test sending fixed size packets at line rate, I know a system can only handle 2-3gbps of traffic on a single core. But when I start capturing with libpcap and look at the drop rates, they stay at zero (or unbelievably low), even when the traffic is in excess of 15gbps...

Although, when we added the rx_dropped stats in, we still saw the problem on vmxnet3, so maybe that's a red-herring, or maybe there are multiple issues here.

guyharris commented 4 months ago

At least one issue appears to be that linux_if_drops() does not report the value of rx_dropped, which indicates the number of packets not forwarded to the upper layers for processing.

That page lists several different receive errors. rx_dropped is one of the more vaguely described ones:

Indicates the number of packets received by the network device but dropped, that are not forwarded to the upper layers for packet processing.

Some others, even though they say "See the network driver for the exact meaning of this value.", are a bit more specific, such as:

rx_crc_errors, which "Indicates the number of packets received with a CRC (FCS) error by this network device.", and which, though it states that "the specific meaning might depend on the MAC layer used by the interface.", a CRC error is a CRC error, which isn't a "lack of resources", it's a separate problem;
rx_fifo_errors. which "Indicates the number of receive FIFO errors seen by this network device.", and could be considered a "lack of resources"in the sense of "lack of room in the receive FIFO";
rx_frame_errors, which "Indicates the number of received frames with error, such as alignment errors.", and is an error similar in character to rx_crc_errors;
rx_length_errors, which "Indicates the number of received error packet with a length error, oversized or undersized.", and is an error similar in character to rx_crc_errors;
rx_missed_errors, which "Indicates the number of received packets that have been missed due to lack of capacity in the receive side.", and is another "lack of resources" error;
rx_nohandler, which "Indicates the number of received packets that were dropped on an inactive device by the network core." - I think I'd have to dive into the networking stack to see what that means;
rx_over_errors, which "Indicates the number of received packets that are oversized compared to what the network device is configured to accept (e.g: larger than MTU).", and is similar to, but not the same as, rx_length_errors;
tx_aborted_errors, which "Indicates the number of packets that have been aborted during transmission by a network device (e.g: because of a medium collision).", and isn't a "lack of resources" error (resources are generally adapter and networking stack resources, not "network not being used right now" resources.

ps_ifdrop is the "number of packets dropped by the network interface or its driver", which is not limited to resource errors, and thus could include all of the above, although it probably shouldn't include packets discarded due to link-layer errors (bad CRC, frame errors, length errors), "too big for the MTU" errors, and "receive aborted" errors.

So that'd count rx_fifo_errors and rx_missed_errors, which are what it counts now. I suspect it shouldn't count rx_nohandler, although, again, I'd have to figure out what that one means.

rx_dropped isn't obviously not a "dropped by the network interface or its driver" error, so it would probably make sense to count it.

Although, when we added the rx_dropped stats in, we still saw the problem on vmxnet3, so maybe that's a red-herring, or maybe there are multiple issues here.

That sounds like packets dropped of some other reason, whether it's one of the above or a reason for which there is no statistic.

gtedesco-r7 commented 4 months ago

Yes, I think you're right about the vagueness there.

Would a patch adding one or more of the above be welcomed? I'd like to do a bit more investigation on this to figure out what might be going on with various popular drivers before coming up with such a patch.

infrastation commented 4 months ago

Adding one or more Linux-specific Rx error counters to the libpcap API would not solve the problem reliably because at least some Linux network interfaces do not report correctly at least some counters depending on:

the driver version
the firmware version
the hardware revision
the receive-path features/optimisations in effect at a particular moment
whether the receive path is saturated or not
a particular combination of the above

Moreover, after you pinpoint a particular counter bug and want it fixed, the responsible party may decline to fix it on the grounds of backward compatibility, or not respond at all. If you would prefer to focus on your project instead, let me recommend picking a particular combination of network hardware and driver that experiences the least problems with the counters, and using that for the debugging.

gtedesco-r7 commented 4 months ago

Right now, our assumption is that while drivers may report stats to different places, none are double-accounting stats into multiple counters, so that adding any missing counters into libpcap's calculations, while it may not solve the problem 100%, can only get the results returned by libpcap closer to the truth. But yes, we need to do more investigation to confirm that that is the case.

gtedesco-r7 commented 4 months ago

We performed a set of tests in laboratory conditions and we have confirmed that adding rx_dropped to ifstats calculations is the correct thing to do. When this is done, the stats all add up:

pkts_received_by_pcap_app + dropped + ifdropped = total_received_by_nic

We had erroneously concluded before that there were still unaccounted for packets after this fix because the site we tested it on had unusually large average packet size (due to jumbo frames + unusual traffic patterns) and we are just, therefore, getting better than normal performance.

I shall prepare a PR for your consideration.

gtedesco-r7 commented 3 months ago

Okay, so we started testing in a wider array of NICs and we ran into some weird stuff. The NIC rx_dropped stats no doubt seem correct, but it looks like the kernel is adding in some core rx_dropped stats: now that covers stuff like internal queues filling up, which is what we want here, but it also seems to include some other kinds of drops. Not sure what it is yet, but some systems we observe a constant 1 or 2 packets per second dropped even at very low traffic, and regardless of traffic rate. We are investigating this further...

It would be great if any Linux net guys could comment? @davem330 :) ?

the-tcpdump-group / libpcap

pcap_stats woefully under-reports ifdropped packets on Linux #1328