nmap / npcap

Nmap Project's Windows packet capture and transmission library
https://npcap.com
Other
3k stars 516 forks source link

Npcap buffer sizes are not well-documented #30

Open dmiller-nmap opened 4 years ago

dmiller-nmap commented 4 years ago

Npcap Guide does not have good documentation of the interaction between the sizes of the kernel buffer, the user buffer, and the number of CPU cores. This information is vital to proper tuning of performance. A brief description follows, which could be used to begin the appropriate documentation:

The Npcap driver stores captured packets in a circular buffer until they are retrieved (pcap_next_ex(), pcap_dispatch(), pcap_loop(), PacketRecievePacket()) by the user program. Each adapter handle (pcap_t or ADAPTER) has its own buffer, the size of which is set via pcap_set_buffer_size(), pcap_setbuff() (deprecated WinPcap extension), or PacketSetBuff(), and which defaults to 1MB. The buffer is split into a number of independent segments according to the number of processors on the system. When a packet is received by the driver, it is put into the buffer segment corresponding to the driver thread's current processor. If there is not enough room in that segment, the packet is dropped.

A user program receives packets into a "user buffer" which is passed to PacketRecievePacket(). The driver empties packets from its own buffer segments in the order they were received until it runs out of space in the user buffer or runs out of packets to return. A large user buffer can ensure the kernel buffer is emptied more quickly, preventing packet drops. The libpcap API (wpcap.dll) configures the size of this user buffer via pcap_setuserbuffer(), which defaults to 256KB.

The current behavior of splitting the buffer into number_of_processors segments is flawed, as described in nmap/nmap#1967, but it needs to be documented anyway, since it's how WinPcap did things, minus the mapping of processor numbers greater than 63 to 63. We can rewrite the documentation if and when we change the behavior.

dmiller-nmap commented 4 years ago

Npcap 0.9991 introduces a change in how the kernel buffer is managed. The buffer is no longer split into segments by CPU, but is managed by a dedicated worker thread managing a work order queue. This should simplify the documentation.

dmiller-nmap commented 4 years ago

Npcap 0.9992 did away with the worker thread, which had performance problems. The kernel "buffer" is now an interlocked queue of packet capture objects, but it obeys the same basic rules regarding buffer size, so documentation should be the same. We should include a section on changes from WinPcap's method of segmenting the buffer, for developers porting WinPcap applications that may be able to use a smaller buffer size than previously.

fyodor commented 3 years ago

Great idea! Perhaps we could even add a performance subsection to the "Developing software with Npcap" section of the guide. For example, @dmiller-nmap recently had some great notes from a user who wanted to capture a ton (~100,000) tiny UDP packets per second for an audio processing application. I'm just going to quote this good stuff here:

I would need to know what their latency/jitter requirements are. If they need to process these in anything resembling real-time, like to deliver smooth audio, then that puts a bigger constraint on it. But if they're willing to process things in batches as long as they don't truly fall behind, then Npcap can be tuned to support some very high throughputs.

First, they need to ensure they can process the packets as fast as possible from the libpcap API standpoint. That means using pcap_getevent() to get a handle for each pcap descriptor and using WaitForMultipleObjects to ensure they only do a Read when it's appropriate.

Read = pcap_dispatch or pcap_next_ex, etc.

Then, they need to make sure they're using appropriate values for the various buffers, timeouts, and the mintocopy value. this will be determined by their lag/jitter requirement. They mention pcap_set_immediate_mode(), but that is performance-intensive compared to letting the driver buffer a bunch of packets to transfer in fewer calls. Immediate mode is a libpcap concept that translates into different implementations, but the buffer sizes and mintocopy value are Npcap-specific tunings described here: https://nmap.org/npcap/guide/npcap-devguide.html#npcap-api-extensions

then they need to make sure their user buffer size is large enough to accommodate at least that much. They can count the number of bytes processed in each call to pcap_dispatch, and if it's really close to the user buffer size, then they're probably leaving packets behind in the kernel buffer that could have been transfered in a single call. Solution: increase user buffer size.

After all that is done, they should watch the values from pcap_stats. If they start seeing the dropped packet count go up, then they are not reading from Npcap fast enough. They can increase the kernel buffer size, but unless their traffic is really bursty, that won't help; they need to process the data faster.

That's either performance improvements in their own user code, or increase the mintocopy value and user buffer size in order to reduce the overhead of calls to the driver to get packets. Oh, and using pcap_setfilter can help because it allows the driver to ignore packets that the user isn't interested in.

fyodor commented 1 year ago

Dan did some research and gave some valuable details here which are worth formally documenting: https://seclists.org/nmap-dev/2022/q4/1. They also go a bit further into performance recommendations beyond just buffer sizes.