Open Ext3h opened 1 year ago
Issue appears to have been exacerbated by not activating the blocking mode - and thereby calling PacketReceivePacket
with a large buffer excessively often. The overhead is significantly lower in blocking mode when the majority of calls does actual work.
This is an interesting idea, and I'm reopening it for further investigation. I would be interested in what your bottlenecks turn out to be once you've optimized transfers using the existing tools: pcap_setmintocopy() and pcap_getevent() or pcap_set_timeout(), as well as managing buffer sizes with pcap_setuserbuffer() and pcap_set_buffer_size().
I do think we could avoid some of this overhead, but VirtualLock may not be the right tool. Here are some links I found so far on the topic of locking a buffer to be used for DMA between user-mode and kernel-mode:
Describe the bug Technically a performance bug, in the combination of npcap with libpcap, when trying to tune npcap for 10GBit+ operation.
This assumes 8MB userspace and kernel side buffers for npcap, in order to get the rate of syscalls to a manageable level in the first place.
For context have a look at the opposing side: https://github.com/the-tcpdump-group/libpcap/blob/fbcc461fbc2bd3b98de401cc04e6a4a10614e99f/pcap-npf.c#L542 https://github.com/the-tcpdump-group/libpcap/blob/fbcc461fbc2bd3b98de401cc04e6a4a10614e99f/pcap-npf.c#L432
When using
PacketReceivePacket
with above buffer sizes, a significant share of time (70%+) is spent inMmProbeAndLockPages
on the level ofNtReadFile
prior toNPF_read
respectivelyMmUnlockPages
atIoCompleteRequest
.As a user of libpcap, the memory for the user space buffer is allocated privately within libpcap, and is never exposed in raw form to the user of the API. Only
PacketInitPacket
orPacketReceivePacket
on the npcap end has the guarantee to see the raw buffer.Given the design of these 2 APIs, as a user I can't do anything to speed up
MmProbeAndLockPages
from the outside. I can't choose large pages, I can'tVirtualLock
.What's worse, is that the overhead scales linearly with the size of the user space buffer - not with the actual amount of data transferred.
Expected behavior
Specifying a large user space buffer on libpcap side doesn't result in excessive overhead on npcap side.
Either npcap optimizes the buffer for repeated use (e.g. by explicitly applying
VirtualLock
inPacketInitPacket
), or libpcap turns smarter when allocating the memory.