the-tcpdump-group / libpcap

the LIBpcap interface to various kernel packet capture mechanism
https://www.tcpdump.org/
Other
2.7k stars 851 forks source link

pcap_stats woefully under-reports ifdropped packets on Linux #1328

Open gtedesco-r7 opened 4 months ago

gtedesco-r7 commented 4 months ago

Many drivers do not report anything into rx_missed_errors or rx_fifo_errors, see vmxnet for example.

At least one issue appears to be that linux_if_drops() does not report the value of rx_dropped, which indicates the number of packets not forwarded to the upper layers for processing.

The impact of this problem is that if I do a controlled test sending fixed size packets at line rate, I know a system can only handle 2-3gbps of traffic on a single core. But when I start capturing with libpcap and look at the drop rates, they stay at zero (or unbelievably low), even when the traffic is in excess of 15gbps...

Although, when we added the rx_dropped stats in, we still saw the problem on vmxnet3, so maybe that's a red-herring, or maybe there are multiple issues here.

guyharris commented 4 months ago

At least one issue appears to be that linux_if_drops() does not report the value of rx_dropped, which indicates the number of packets not forwarded to the upper layers for processing.

That page lists several different receive errors. rx_dropped is one of the more vaguely described ones:

Indicates the number of packets received by the network device but dropped, that are not forwarded to the upper layers for packet processing.

Some others, even though they say "See the network driver for the exact meaning of this value.", are a bit more specific, such as:

ps_ifdrop is the "number of packets dropped by the network interface or its driver", which is not limited to resource errors, and thus could include all of the above, although it probably shouldn't include packets discarded due to link-layer errors (bad CRC, frame errors, length errors), "too big for the MTU" errors, and "receive aborted" errors.

So that'd count rx_fifo_errors and rx_missed_errors, which are what it counts now. I suspect it shouldn't count rx_nohandler, although, again, I'd have to figure out what that one means.

rx_dropped isn't obviously not a "dropped by the network interface or its driver" error, so it would probably make sense to count it.

Although, when we added the rx_dropped stats in, we still saw the problem on vmxnet3, so maybe that's a red-herring, or maybe there are multiple issues here.

That sounds like packets dropped of some other reason, whether it's one of the above or a reason for which there is no statistic.

gtedesco-r7 commented 4 months ago

Yes, I think you're right about the vagueness there.

Would a patch adding one or more of the above be welcomed? I'd like to do a bit more investigation on this to figure out what might be going on with various popular drivers before coming up with such a patch.

infrastation commented 4 months ago

Adding one or more Linux-specific Rx error counters to the libpcap API would not solve the problem reliably because at least some Linux network interfaces do not report correctly at least some counters depending on:

Moreover, after you pinpoint a particular counter bug and want it fixed, the responsible party may decline to fix it on the grounds of backward compatibility, or not respond at all. If you would prefer to focus on your project instead, let me recommend picking a particular combination of network hardware and driver that experiences the least problems with the counters, and using that for the debugging.

gtedesco-r7 commented 4 months ago

Right now, our assumption is that while drivers may report stats to different places, none are double-accounting stats into multiple counters, so that adding any missing counters into libpcap's calculations, while it may not solve the problem 100%, can only get the results returned by libpcap closer to the truth. But yes, we need to do more investigation to confirm that that is the case.

gtedesco-r7 commented 4 months ago

We performed a set of tests in laboratory conditions and we have confirmed that adding rx_dropped to ifstats calculations is the correct thing to do. When this is done, the stats all add up:

pkts_received_by_pcap_app + dropped + ifdropped = total_received_by_nic

We had erroneously concluded before that there were still unaccounted for packets after this fix because the site we tested it on had unusually large average packet size (due to jumbo frames + unusual traffic patterns) and we are just, therefore, getting better than normal performance.

I shall prepare a PR for your consideration.

gtedesco-r7 commented 3 months ago

Okay, so we started testing in a wider array of NICs and we ran into some weird stuff. The NIC rx_dropped stats no doubt seem correct, but it looks like the kernel is adding in some core rx_dropped stats: now that covers stuff like internal queues filling up, which is what we want here, but it also seems to include some other kinds of drops. Not sure what it is yet, but some systems we observe a constant 1 or 2 packets per second dropped even at very low traffic, and regardless of traffic rate. We are investigating this further...

It would be great if any Linux net guys could comment? @davem330 :) ?