Read directly from network device

blipp commented 8 years ago

I did some initial development to enable pac-driver to read directly from a network device using libpcap. I would like to propose this as a pull request. In the following, I summarize the work I did, and at the end I list some questions regarding this implementation that should probably be discussed before considering a merge.

The reason I implemented this was because I need to analyse Ethernet packets different from TCP, UDP and ICMP, which are the only protocols that can be analyzed with the Bro plugin. On the long run, it would surely be better to adapt Bro, but for now this seemed too much to dig in.

I introduce a new parameter -n that expects the network device as argument. If it is set, then instead of the normal parseSingleInput call, libpcap is used with a callback function processPacket for every packet that arrives. The code of this function is inspired by the existing code of pac-driver.

I did not implement support for -e and -i because this had no immediate benefit for me, but I think it could be possible. I added this restriction to the help output.

Because grammars usually need to know the length of the captured content, this is done by prepending this information as uint32. Thus, this leads us to some kind of a header before the beginning of the actual Ethernet packet, like in pcap.pac2. I added an example grammar with networkinterface.pac2. After the header it immediately refers to the pcap grammar for the handling of Ethernet packets.

I added a test for this new functionality, which uses tcpreplay to replay dns.trace, which is already included in the source tree. The implementation of the test is kind of ugly, because the grammar does not compile with hilti-build because of #3. As soon as this gets fixed, there is no need for this ugly timer-based approach.

pac-driver seems to be able to process packets quite fast with this addition, although I have no statistics to show.

Questions:

At the end of processPacket, there is the commented line GC_DTOR(input, hlt_bytes, ctx);. This line is also present in parseSingleInput and that's why I copied it. The problem is that pac-driver segfaults at the processing of the second packet if this line is executed and that's why I commented it out. I suspect this of being a memory leak. What do you think?
Generally, what do you think of processPacket? Do you see possible optimizations?
If debugging output is enabled, all captured bytes are printed, which is quite a lot of output. Should this be changed to print just the first x bytes? Or introduce another command line parameter?

blipp commented 8 years ago

Although this could be derived from the test I wrote, I should provide the commands to test this feature:

pac-driver -n lo -g tests/binpac/parsers/networkinterface/test.pac2 libbinpac/parsers/networkinterface.pac2

tcpreplay --intf1=lo bro/tests/Traces/dns.trace

Execute the first one and wait a little bit until --- pac-driver: opened network device 'lo'. appears. Then execute the second one in another shell. While tcprelay is generating packets, 6 lines with IPV4 should appear one after another in pac-driver's shell.

rsmmr commented 8 years ago

Nice patch, thanks. Couple thoughts:

would be nice to have an offline version too that reads from a trace; that's better suited for testing than going through tcpreplay.
I see why you added the Packet type receiving the capture length, but I'm wondering if there's a better way to do that that doesn't require prepending a header to the data. Actually I'm wondering why EthernetPacket requires the capture length in the first place; that's a question for @0xxon. :)

Regarding your questions:

Yes, I would expect the GC_DTOR to have to run to release the input object; that DTOR would be the release corresponding to the hlt_bytes_new call. But it depends on what happens in between; something else might take ownership of the object (I haven't look closely enough yet). If it's a memory leak, the test case should tell you, if you hadn't disabled it by setting HILTI_DEBUG. :-)
for debugging output, printing just a few bytes per packets sounds good.
haven't looked more closely at ProcessPacket() yet

0xxon commented 8 years ago

Just a bit of context about EtherPacket and why it requires a length. Basically -- the Ethernet frame itself does not contain any length information about its payload. And the payload of the protocols it contains (like, e.g. IP) does not have to be equivalent to the actual payload of the packet (e.g. when someone adds frame-checksums or timestamp information like Arista can do to it). In those case, you would not read the entire packet if you don't have the outside information that tells you the packet length :)

Note - currently I think that length is not really used to make sure that really all data has been read - but it can and should be used for that in the future.

blipp commented 8 years ago

Thanks for your feedback!

I added the possibility to read directly from a pcap file with a new command line option -f. Thanks for this idea, I think this is really useful. Now I wonder why I did not think of it... It uses the same code than for listening on a network interface, just the specification of the source is different. Example usage:
- pac-driver -f bro/tests/Traces/dns.trace -g tests/binpac/parsers/libpcap/pcapfile.pac2 libbinpac/parsers/libpcap.pac2 (note that I changed some file names!)
- If - is given as file name, libpcap reads from stdin: cat bro/tests/Traces/dns.trace | pac-driver -f - -g tests/binpac/parsers/libpcap/pcapfile.pac2 libbinpac/parsers/libpcap.pac2.
With @0xxon's comment I think it's clear that the caplen should stay. Thanks for the information!
I changed the debug output to only print the first 14 bytes. This way the output should fit into one line.
I renamed networkinterface.pac2 to libpcap.pac2 because this grammar is now used by both reading from pcap files and listening on network interfaces. This name is very similar to pcap.pac2 which could lead to confusion. I am not sure though how to resolve this. Both files have their reason, one for parsing pcap files as they are (pcap.pac2), and the other for things read by libpcap (libpcap.pac2). Any ideas? Maybe rename pcap.pac2 to pcapfile.pac2?
Regarding memory leaks, I did not disable the test for leaks when starting to write the test and it showed memory leaks. But this was at the time when I tried to get the test running at the first place, so I disabled it. Still have to look at it. If you have some time I would be glad if you could look at it, because I am not very experienced with Hilti's memory management yet.

blipp commented 8 years ago

I enhanced the patch so now the packet capture time and the actual length of the packet is given to the grammar as well. Tests and grammars still have to be adapted.

rsmmr / hilti

Read directly from network device #13