the-tcpdump-group / libpcap

the LIBpcap interface to various kernel packet capture mechanism
https://www.tcpdump.org/
Other
2.68k stars 847 forks source link

tcpdump can write DLT_NULL with wrong-endian header #74

Open guyharris opened 11 years ago

guyharris commented 11 years ago

Converted from SourceForge issue 1524397, submitted by adammclaurin

This bug report may be a little too broad, but it is certainly reproducable.

I have Pcap savefiles (containing only IPv4+TCP packets) captured on the loopback device (DLT_NULL) of an old big-endian Mac laptop. These savefiles are now on my little-endian Linux 2.6.9 workstation.

I needed to chop off the first 1000 packets, so I used tcpdump with -c, -r, and -w to do so.

The trouble is, capture filters absolutely don't work (i.e., everything gets filtered out) on the "chopped" Pcap file using tcpdump. Even a filter as unrestrictive as "ip" produces no results. Capture filters work fine on the original file.

Without any capture filter, 'tcpdump -vvv' shows exactly the same output on the first 1000 packets in the original file and the "chopped" file.

Strangely, capture filters do work on the chopped file using tethereal.

I have been troubleshooting this for a while, and one interesting thing I found is that if I convert the original Pcap file with 'tethereal' (again with -c, - r, -w), capture filters (using tcpdump) will also fail on the generated file, but filtering with tethereal works.

I even went so far as writing my own pcap-based application to "chop" a given number of packets and write to a savefile; capture filters (using tcpdump) don't work on that either, but filters using tethereal again works fine.

Also, I wrote my own pcap application for applying a capture filter to a savefile and print out some debug information about what packets actually pass the filter. The application processes the original savefile fine, but (like tcpdump) filters out all packets on the "chopped" file with a filter as simple as "ip".

Since chopping with either tcpdump, tethereal, or my own pcap application leads to capture filter problems, it seemed appropriate to file the bug against libpcap.

I'm not sure if the core of the problem lies in the difference in endianness between the "capture machine" and my Linux workstation, or if it's something to do with DLT_NULL.

It would probably be helpful to post the original Pcap file online, but it was captured on a colleague's home network so I'll need to get his permission to do so. I was hoping that the above description might be enough to get started.

Thanks, Adam

guyharris commented 11 years ago

Submitted by adammclaurin

Logged In: YES user_id=1554689

It's also probably worth noting that I also downloaded the latest tcpdump/libpcap and experienced the exact same problems.

Also, the Linux machine is running Fedora Core 3, if it matters.

guyharris commented 11 years ago

Submitted by adammclaurin

Logged In: YES user_id=1554689

Here's another piece to the puzzle. I hacked together a little tool to convert a DLT_NULL Pcap file to DLT_EN10MB with faked addresses in the Ethernet header.

I used this tool to process the "chopped" Pcap file that was giving me problems. Amazingly, once I wrote out the chopped Pcap file in DLT_EN10MB format, capture filters worked fine!

Thanks again, Adam

guyharris commented 11 years ago

Submitted by adammclaurin

Logged In: YES user_id=1554689

Mystery solved!

So here's what's happening. The original machine is a big- endian Mac capturing packets on the loopback interface (DLT_NULL). The link-layer header is a 4-byte field in host byte order (in this case, big-endian).

Now, I copy the captured file to my little-endian Linux box. I use 'tcpdump -c 1000 -r orig.pcap -w chop.pcap' to copy the first 1000 packets from the original Pcap file to the chopped Pcap file.

However, since this machine is little-endian, the Pcap headers (in particular the magic number) are written in little-endian order. Unfortunately, tcpdump is not 'smart enough' to re-order the 4-byte protocol family field in the BSD loopback header.

Later, when I run my capture filters on the chopped file, nothing passes because 'tcpdump' thinks the endianness of the capture machine is the same as the current machine (since the magic number is written in the expected order). Therefore, it assumes the protocol family value is in little-endian, which does not represent PF_INET, and therefore the capture filter "ip" won't match.

I tested my theory by manually editing the chopped Pcap file with a hex editor and re-ordering the protocol family of the first packet myself. Sure enough, that packet (and only that packet) matched the "ip" capture filter.

Oddly, the tcpdump "dissector" seems to be smart enough to display the packet as IP, but the filtering mechanism is not. However, 'tethereal' seems to have hacked in some code (which I haven't found yet) to detect this anomalous byte ordering when filtering.

My question now is whether or not it's within the scope of tcpdump's intended feature-set to be smart enough to re- order the protocol family byte ordering when necessary.

Thanks again, Adam

guyharris commented 11 years ago

Submitted by guy_harris

Logged In: YES user_id=541179

The problem is that the libpcap filtering code assumes that the endianness of the file header matches the endianness of the packet type field in DLT_NULL captures. That's true if the file is written on the machine on which the capture was done, but if tcpdump reads a file of the opposite endianness from the native byte order and writes out a new file, the endianness of the file header will be that of the machine running tcpdump and the endianness of the packet type will be that of the machine that originally wrote the capture.

The BPF compiler assumes that, if it's generating filter code for DLT_NULL, the endianness of the header matches the endianness of the file header. If that's not the case, the generated code won't work.

The right fix might be to have libpcap force the link-layer type header to be in host byte order when reading a capture file, and have the BPF compiler always generate code for the host byte order, whatever that might be.