guyharris commented 6 months ago

libpcap should have full support added for pcapng files.

The current APIs of libpcap for capturing packets and reading from files was designed before pcapng existed; it was designed for pcap, which was the file format created for tcpdump and supported by libpcap when the libpcap code was extracted from tcpdump and put into a separate library.

Thus, it has no notion of a pcapng block type, and provides only packet block information to the callback, as that's the only type of record pcap has.

To support capturing and reading, new APIs, designed to fully support pcapng, should be added; those APIs should also support pcap files by, for example, providing a fake Section Header Block synthesized from the byte-order in the pcap file header, a fake Interface Description Block synthesized from the time stamp resolution indicated by the magic number in the pcap file header and a link-layer type and snapshot length using the values from the pcap file header (it can't provide an interface name as that's not stored in pcap files), and fake Enhanced Packet Blocks synthesized from packet headers in the packet records in the pcap file.

The APIs for writing files are also pcap-only; new APIs should be added to support writing pcapng files. Those APIs should also support, for example, reporting errors when writing - and when closing a file, as the last part of the file might not be written until the file is closed (all the way down to the lowest level OS call to close the file, such as close() in UN*Xes, as, for NFS and possibly other remote file systems, the write to the underlying file system isn't done immediately, it may be buffered on the client side and later asynchronously sent to the server, with errors, such as "file system full" or "quota exceeded" or, rarely, a real I/O error, returned by the server reported in subsequent writes, which might be done as part of a close operation).

mcr commented 6 months ago

To support capturing and reading, new APIs, designed to fully support pcapng, should be added; those APIs should also support pcap files by, for example, providing a fake Section Header Block synthesized from the byte-order in the pcap file header, a fake Interface Description Block synthesized from the time stamp resolution indicated by the magic number in the pcap file header and a link-layer type and snapshot length using the values from the pcap file header (it can't provide an interface name as that's not stored in pcap files), and fake Enhanced Packet Blocks synthesized from packet headers in the packet records in the pcap file.

exactly. I don't really know what the right model is going to be. We basically need to greenfield this, and I think it requires a non-trivial amount of collaborative brainstorming. I wonder if this could happen at a sharkfest or something like that.

guyharris commented 6 months ago

Some initial requirements for pcapng support

Support for all defined pcapng blocks, as well as vendor and custom blocks

All blocks should be made available to readers and should be writable by writers. This includes locally-defined block types and custom blocks.

Readers should not have to byte-swap fields in officially-defined blocks or the officially-defined fields in custom blocks. However, they might have to provide code to byte-swap fields in locally-defined blocks and, if necessary, subfields of the Custom Data field in custom blocks, although we could provide that code ourselves for some such blocks.

A program using these APIs should be able to read both pcapng and pcap files

There should not be separate "open a pcap file" and "open a pcapng file", unlike what Apple did in their internal (and not publicly available) APIs in libpcap. Having to try opening the file as pcapng and, if that fails, try opening it as pcap is clumsy and unnecessarily awkward.

If a pcap file is opened, the program should receive a simulated Section Header Block (SHB), followed by a simulated Interface Description Block (IDB) (without if_name or if_description options, as that information is not available in pcap files), followed by a sequence of Enhanced Packet Blocks (EPBs).

Capturing and reading should have calls similar to `pcap_dispatch()` and `pcap_loop()`

The scheme used by those routines, with callbacks, is, in some cases, inconvenient relative to pcap_next_ex(). However, given the way that memory-mapped capture works on Linux, which requires that, once a captured packet, or a batch of captured packets, have been processed, the packet or batch be returned to the kernel, it works better when capturing on Linux, as returning from the callback can be treated as a "has been processed" indication. pcap_next_ex() works around this by providing its own callback that makes a copy of every packet, so that its data remains available after pcap_next_ex() returns.

Thus, we should not solely provide a routine similar to pcap_next_ex(). If we do provide such a routine, we need not and should not bother providing a routine similar to pcap_next(), as that routine is broken by design - it cannot return different indications for "end of file" and "an error occurred" when reading from a capture file.

Support for reporting errors and warnings from capture/file reading callbacks

Callbacks for all new "capture/read packet" callback APIs should return an int value that is either:

0, on success;
a PCAP_ERROR_ value on an error;
a PCAP_WARNING_ value on a warning condition.

Non-zero return values should cause the loop to terminate, with the return value returned by the routine.

This would, for example, allow errors when writing a capture file to be reported, as per issue #1047.

It would also permit a callback to slightly more conveniently request that the loop terminate, by returning PCAP_ERROR_BREAK. This would allow the "stop after seeing this many packets" functionality of pcap_dispatch() and pcap_loop() to be implemented by the callback. Leaving it up to the callback, rather than having a count argument to the equivalent of pcap_dispatch() and pcap_loop(), means we don't have to decide what the count means - is it a count of packet blocks, of packets blocks plus some types of non-packet blocks such as system events, or of all blocks? Different programs might have different requirements, and "return PCAP_ERROR_BREAK" leaves that up to the program.

An error or warning returned from the capture/read routine does not necessarily mean that the program cannot continue. A warning indicates that some event worthy of note occurred, but does not indicate that the capture or read process failed, so the program can call the capture/read routine again after handling the warning condition. The same applies to PCAP_ERROR_BREAK.

We might want to add PCAP_ERROR_ values for some file system errors, such as "out of space", "disk quota exceeded", and "I/O error", as those are the most likely errors that a "write a packet" callback would report, and, on Windows, there's no EDQUOT errno value provided by the C library, but there is a "disk quota exceeded" error that a write operation can return.

A new "get statistics" routine, providing an Interface Statistics Block (ISB), should be provided

pcap_stats() has three problems:

it cannot provide statistics other than the ones present in a struct pcap_stat;
it cannot provide counts that won't fit in a member of that structure, so if those members are 32-bit integers, a count > 2^32-1 cannot be provided;
it cannot indicate that some statistic isn't available, so tcpdump, for example, may print out meaningless count values on some platforms.

An API that takes a set of Interface Statistics Block option type values, and returns an Interface Statistics Block with a subset of those options, with statistics that aren't supported not being provided, would solve all of those problems, as well as making it easier to write that information to a pcapng file.

kayoub5 commented 6 months ago

@guyharris take a look at https://github.com/Technica-Engineering/LightPcapNg for inspiration, pure C pcapng library.

sfd commented 5 months ago

This sounds like a good idea. Some questions:

The ticket title specifies pcapng files, but you also intend live capture to support the pcapng format, blocks etc? The API for modules will also need appropriate updates.

When configuring live capture, should libpcap offer IDBs to describe the available interfaces (pcap_findalldevs() equivalent)?

Should libpcap create a SHB when starting live capture, and emit it as the first block (followed by IDBs etc), or is that the application's responsibility? The application may or may not be intending to write a pcapng file.

guyharris commented 5 months ago

but you also intend live capture to support the pcapng format, blocks etc?

Yes. For example, capturing on either the Linux or macOS "any" device should provide IDBs for each interface, and EPBs for packets that indicate the interface on which the packet arrived, as well as providing various EPB options such as the flag word, so that packet direction, etc. can be provided.

The API for modules will also need appropriate updates.

Yes.

When configuring live capture, should libpcap offer IDBs to describe the available interfaces (pcap_findalldevs() equivalent)?

Yes, as per the above.

Should libpcap create a SHB when starting live capture, and emit it as the first block

Yes.

The application may or may not be intending to write a pcapng file.

The application can ignore blocks that contain no information of interest to it.

If the application is writing a pcap file:

The API for opening a pcap file for writing would take a link-layer type and snapshot length, which would be returned by calling pcap_datalink() and pcap_snaplen() on the pcap_t; the first of those would return either a DLT_ if there's only one link-layer type or an error if there isn't (when reading a pcapng file, it would read blocks until it saw one that was neither an SHB nor an IDB, and stop and return an error if it sees more than one link-layer type; when capturing, it would look at all the currently-known interfaces and do the same), and the later would return the snapshot length if there's only one snapshot length or return an error if there's more than one (in the same fashion.
The API for writing blocks (which would not be the same as the current API, as that has some problems, such as the lack of any indication of write errors - see, for example, #1047) would:
- for an IDB, make sure it has the same link-layer type and IDB, and return an error otherwise;
- for a packet block, jus write out the time stamp, captured length, actual length, and packet data, discarding all the other information;
- for other blocks, do nothing;

and return 0 on success and a warning or error code on a warning condition or error.

sfd commented 5 months ago

Should libpcap create a SHB when starting live capture, and emit it as the first block

Yes.

With Section Length set to -1 presumably.

In support of lossless Section rotation (and potentially file rotation) in live capture it would be helpful for the application to be able to request a new SHB.

In particular it would be helpful to support requesting a new SHB before a specific timestamp is passed or size is exceeded. For example starting capture at 1:30pm and requesting a new SHB be emitted at 2:00pm. Libpcap would then continue supplying EPBs until a block with timestamp after 2:00pm is captured, at which point it would emit the new SHB first, followed by IDBs, and the EPB inside the new Section.

The application could then choose whether to additionally rotate the files at the SHB boundary (without packet drop).

Alternatively the application could cache the initial SHB and IDBs and do the Section rotation itself, including updating details which may have changed in the meantime.

guyharris commented 5 months ago

With Section Length set to -1 presumably.

Yes, given that it doesn't know how bit the section will be - and, given that it might be writing to a pipe, it might not even be able to fix it later (I guess it could check whether it's writing to something other than a regular file, and fix it up if not when it closes the file, but Wireshark doesn't bother doing that, so it's not a high priority).

In support of lossless Section rotation (and potentially file rotation) in live capture it would be helpful for the application to be able to request a new SHB.

The only reason I've ever seen for multiple SHBs was to allow simple concatenation of pcapng files, which I think was the original reason for supporting it. Is there another use case for that?

sfd commented 5 months ago

Yes, given that it doesn't know how bit the section will be - and, given that it might be writing to a pipe, it might not even be able to fix it later (I guess it could check whether it's writing to something other than a regular file, and fix it up if not when it closes the file, but Wireshark doesn't bother doing that, so it's not a high priority).

That is a shame as the fix up is relatively cheap, provided the file is seekable.

The only reason I've ever seen for multiple SHBs was to allow simple concatenation of pcapng files, which I think was the original reason for supporting it. Is there another use case for that?

If the Section Lengths are updated after the fact, then starting new Sections within a file periodically could make a large file more easily navigable/seekable. You could potentially direct Wireshark to dissect only a specific Section within a file to avoid memory exhaustion.

Alternatively the application could simply break the file at the Section boundary and continue capturing without losing packets.

guyharris commented 5 months ago

If the Section Lengths are updated after the fact, then starting new Sections within a file periodically could make a large file more easily navigable/seekable.

Presumably for "navigable" you mean "navigable by the end user", e.g. having a display of sections, with start and end dates.

Is the same the case for "seekable"? (Wireshark does all seeks in most file types, including pcapng, as random accesses to particular file offsets.)

sfd commented 5 months ago

Right, navigable meaning an index of Sections like book chapters in the UI. With Section Lengths present this could typically be generated in less than a second.

Seeking meaning selecting and loading one (or more) Section(s) from a large file without having to scan linearly through the file, or dissect all blocks into memory.

guyharris commented 5 months ago

Right, navigable meaning an index of Sections like book chapters in the UI. With Section Lengths present this could typically be generated in less than a second.

But section boundaries are arbitrary; there's no guarantee that, for example, a request won't be in one section and the reply to it in another. Do they really correspond to book chapters, unless you have a book in which one character asks a question at the end of a chapter the reply to which appears early in the next chapter?

Seeking meaning selecting and loading one (or more) Section(s) from a large file without having to scan linearly through the file, or dissect all blocks into memory.

Wireshark makes only one linear scan through the file, building a table of frames, each entry in which has an offset in the file to the beginning of the block/record for that frame. It doesn't load all blocks into memory, nor does it save in memory the dissection of all frames.

And to find a given frame within a section requires scanning linearly through the section and building an index for the section.

sfd commented 5 months ago

But section boundaries are arbitrary; there's no guarantee that, for example, a request won't be in one section and the reply to it in another. Do they really correspond to book chapters, unless you have a book in which one character asks a question at the end of a chapter the reply to which appears early in the next chapter?

Sure, file rotation in live capture is always arbitrary, but still useful. Only a relatively small number of flows will be broken across the boundary. If you can navigate by Section within a single file you might choose to start filling your packet list a few thousand packets before the end of the previous Section, just as you might flip a few pages back before the start of a new chapter to remind yourself of context. Reverse seeking in pcapng makes this easy to implement and fast.

Wireshark makes only one linear scan through the file, building a table of frames, each entry in which has an offset in the file to the beginning of the block/record for that frame. It doesn't load all blocks into memory, nor does it save in memory the dissection of all frames.

Right, but even the 'table of frames' grows in memory linearly with the number of packets scanned? Isn't there also some stream indexing done at the first pass, e.g. for TCP?

And to find a given frame within a section requires scanning linearly through the section and building an index for the section.

True, but linear scanning a single 1GB Section in a 100GB file is still ~100x faster.

I apologise this discussion is getting off topic, Wireshark features are likely best discussed there. I think that multiple Sections per capture or file can be useful, and would be worth supporting in libpcap. The capturing application can do most of the work.

guyharris commented 5 months ago

I think that multiple Sections per capture or file can be useful, and would be worth supporting in libpcap. The capturing application can do most of the work.

So a call that causes the next block provided to the callback to be an SHB, followed by a set of IDBs?

Providing a mechanism to patch the section size in the previous SHB can't be done in the capture/read code path, as there's no guarantee that the application is writing a capture file. That's probably best done in the "write a pcapng file" code when it sees a new SHB to be written; it would do that when writing to a regular file.

guyharris commented 5 months ago

...and, of course, only do that if the total number of bytes worth of blocks written to the section, after the SHB was written, is < 07FFFFFFF; otherwise, it won't fit in a signed 32-bit integer field.

sfd commented 5 months ago

I think that multiple Sections per capture or file can be useful, and would be worth supporting in libpcap. The capturing application can do most of the work.

So a call that causes the next block provided to the callback to be an SHB, followed by a set of IDBs?

Either that, or require the application to either cache them or explicitly re-request them for the new Section. Perhaps that is simpler.

Providing a mechanism to patch the section size in the previous SHB can't be done in the capture/read code path, as there's no guarantee that the application is writing a capture file. That's probably best done in the "write a pcapng file" code when it sees a new SHB to be written; it would do that when writing to a regular file.

Yes, it would be helpful if closing a Section in the writing code updated the SHB automatically.

sfd commented 5 months ago

...and, of course, only do that if the total number of bytes worth of blocks written to the section, after the SHB was written, is < 07FFFFFFF; otherwise, it won't fit in a signed 32-bit integer field.

SHB Section Length is 64-bits?

https://ietf-opsawg-wg.github.io/draft-ietf-opsawg-pcap/draft-ietf-opsawg-pcapng.html#name-section-header-block

guyharris commented 5 months ago

Also, we may, in the future, support modules for the lower-level IO part of reading and writing capture files, to allow asynchronous I/O, direct reading and writing of compressed files, etc.. See, for example, #442 and #982. Updating the section length will not be possible for compressed files.

guyharris commented 5 months ago

So a call that causes the next block provided to the callback to be an SHB, followed by a set of IDBs?

Either that, or require the application to either cache them or explicitly re-request them for the new Section. Perhaps that is simpler.

I.e., having an API to switch sections is simpler?

sfd commented 5 months ago

Also, we may, in the future, support modules for the lower-level IO part of reading and writing capture files, to allow asynchronous I/O, direct reading and writing of compressed files, etc.. See, for example, #442 and #982. Updating the section length will not be possible for compressed files.

If packet blocks and other blocks are compressed, but SHBs are not, then updating the Section Length would still be possible.

sfd commented 5 months ago

I.e., having an API to switch sections is simpler?

Yes, provided the Section can be switched after a block has been received for the new section.

I.e. dumpcap wants to rotate files on a 5 minute boundary. It receives an EPB with a timestamp in the new period from the capture API, so closes the existing Section/file with the write API, opens a new file/Section/IDBs, then writes the current EPB into the new file/Section.

No packets/EPBs should be dropped during the rotation process, provided capture buffers are not overrun.

guyharris commented 5 months ago

If packet blocks and other blocks are compressed, but SHBs are not, then updating the Section Length would still be possible.

The most common forms of compression are 1) running a compression program or 2) putting a compression library at a low level in the file-writing path; neither those programs nor those library have any clue about pcapng blocks, so individual blocks will not be compressed, the entire file will be compressed.

The pcapng-extras document has a "Compression Block (experimental)", which says that the contents of the "Compressed Data" is, when decompressed, "made of other blocks", so it can presumably contain one or more blocks, presumably full blocks. As far as I know, there are no implementations of that whatsoever.

So there is currently no mechanism in existence to arrange that "packet blocks and other blocks are compressed, but SHBs are not". That might be something to consider for the future.

guyharris commented 5 months ago

Yes, provided the Section can be switched after a block has been received for the new section.

Or the new file; a "start a new section" call would also be useful when rotating files.

I.e. dumpcap wants to rotate files on a 5 minute boundary. It receives an EPB with a timestamp in the new period from the capture API, so closes the existing Section/file with the write API, opens a new file/Section/IDBs, then writes the current EPB into the new file/Section.

That's a bit more complicated, as "[the program writing a file, which isn't necessarily dumpcap] receives an EPB with a timestamp in the new period from the capture API" means that program's callback for a new block has already been called.

Perhaps having libpcap trigger a file/section break before each block would be the right answer here. I'd prefer not to have all rotation policies wired into libpcap itself, so perhaps having a separate "time to rotate/stop/whatever" callback, which can use any criterion it wants to rotate/stop, and hand the block to that callback before handing it to the "process this block" callback, would be the right answer.

sfd commented 5 months ago

a "Compression Block (experimental)". As far as I know, there are no implementations of that whatsoever.

Understood. It sounds like the experimental Compression Block would achieve what I suggested. Ideally someone would contribute an implementation. I think there may be more demand for this capability (compared to just compressing the entire file) if reading applications implemented multi-Section 'navigation' discussed previously.

sfd commented 5 months ago

Perhaps having libpcap trigger a file/section break before each block would be the right answer here. I'd prefer not to have all rotation policies wired into libpcap itself, so perhaps having a separate "time to rotate/stop/whatever" callback, which can use any criterion it wants to rotate/stop, and hand the block to that callback before handing it to the "process this block" callback, would be the right answer.

Agreed, implementing rotation policies in libpcap would not be ideal. Better to keep that logic in the application.

An additional 'rotation check' callback before every 'new block' callback seems redundant, provided the application could implement the rotation logic from the 'new block' callback?

the-tcpdump-group / libpcap

libpcap should fully support pcapng files #1321

Some initial requirements for pcapng support

Support for all defined pcapng blocks, as well as vendor and custom blocks

A program using these APIs should be able to read both pcapng and pcap files

Capturing and reading should have calls similar to `pcap_dispatch()` and `pcap_loop()`

Support for reporting errors and warnings from capture/file reading callbacks

A new "get statistics" routine, providing an Interface Statistics Block (ISB), should be provided

the-tcpdump-group / libpcap

libpcap should fully support pcapng files #1321

Some initial requirements for pcapng support

Support for all defined pcapng blocks, as well as vendor and custom blocks

A program using these APIs should be able to read both pcapng and pcap files

Capturing and reading should have calls similar to pcap_dispatch() and pcap_loop()

Support for reporting errors and warnings from capture/file reading callbacks

A new "get statistics" routine, providing an Interface Statistics Block (ISB), should be provided

Capturing and reading should have calls similar to `pcap_dispatch()` and `pcap_loop()`