Need help with "-w -" option to understand

tejaskumark commented 4 years ago

Hello All,

I am writing one utility for one of my project, where currently I create pcap file by dumping information capturing through raw socket. But now I am planning to do it same as "tcpdump -i eth0 -w - | wireshark -k -i -" does, simply write to stdout and then I can pipe it to remote machine wireshark so that I can have live capture over there.

Currently I am doing like this, to capture packets to file, and it is working fine. write(fileno(fp), &pcapfh, 24); write(fileno(fp), &pcaphdr, sizeof(pcaphdr)); write(fileno(fp), pkt_ptr, bytes_to_write);

But when I start dump to stdout, as below, it does not work. write(fileno(stdout), &pcapfh, 24); write(fileno(stdout), &pcaphdr, sizeof(pcaphdr)); write(fileno(stdout), pkt_ptr, bytes_to_write);

If anybody can help me to understand format requirement to dump to stdout so that wireshark can understand live capture, that would be great help.

mcr commented 4 years ago

Perhaps you want to look at rpcapd?

guyharris commented 4 years ago

On what operating system is your utility running? If it's running on Windows, it must set stdout to binary mode:

_setmode(_fileno(stdout), _O_BINARY)

tejaskumark commented 4 years ago

Thanks for prompt replies. @mcr My remote system does not have rpcapd, and also not possible to get one due to space constraint on it. Only possibility is to get it done with my utility. @guyharris My utility is running on linux system. write(fileno(stdout), &pcaphdr, sizeof(pcaphdr)); do you think this is right way for linux? I am wondering how tcpdump is handling same thing.

guyharris commented 4 years ago

@guyharris My utility is running on linux system.

Linux is a UN*X, and on UN*Xes, there's no difference between "text" and "binary" mode for I/O.

There's a difference on Windows, because the C I/O routines were oriented towards UNIX, where lines end with \n, but lines end with \r\n on Windows, so "text" I/O has to handle reading from files with \r\n line endings and write to files with \r\n line endings even if the software doing the I/O is expecting just \n. Doing so makes text file I/O portable, but damages reads from and writes to binary files, where if you read a byte with the value 0x0d followed by a byte with the value 0x0a, you expect to get a byte with the value 0x0d followed by a byte with the value 0x0a, and if you write to a file a byte with the value 0x0a, you expect only a byte with the value 0x0a to be written.

write(fileno(stdout), &pcaphdr, sizeof(pcaphdr)); do you think this is right way for linux?

If you're writing in binary mode, it's the right way on all operating systems.

The code

write(fileno(fp), &pcapfh, 24);

works; the only difference between that and

write(fileno(stdout), &pcaphdr, sizeof(pcaphdr));

is that the first code has fp as a variable, while the second code has the constant value stdout. They should work the same.

There may be more code in the way in the second case, however.

You said

where currently I create pcap file by dumping information capturing through raw socket. But now I am planning to do it same as "tcpdump -i eth0 -w - | wireshark -k -i -" does, simply write to stdout and then I can pipe it to remote machine wireshark so that I can have live capture over there.

Does "I create pcap file by dumping information capturing through raw socket" mean that your program reads captured packets from a raw socket and writes those packets to a file, where the file is opened by something such as

FILE *fp = fopen(capture_file_path, "w");

and then fp is used in the

write(fileno(fp), &pcapfh, 24);
write(fileno(fp), &pcaphdr, sizeof(pcaphdr));
write(fileno(fp), pkt_ptr, bytes_to_write);

code?

Then you say

But now I am planning to do it same as "tcpdump -i eth0 -w - | wireshark -k -i -" does, simply write to stdout and then I can pipe it to remote machine wireshark so that I can have live capture over there.

Does that mean that your new program reads captured packets from a raw socket and writes those packets to the standard output, with the

write(fileno(stdout), &pcapfh, 24);
write(fileno(stdout), &pcaphdr, sizeof(pcaphdr));
write(fileno(stdout), pkt_ptr, bytes_to_write);

code, and you arrange that your code runs on the Linux machine on which you're doing the capture, and that program's standard output gets sent over a network connection to a remote machine on which Wireshark is running, and Wireshark reads its standard input?

If so, how are you setting up that network connection? Does it use, for example, ssh, so that you do

ssh {the Linux machine} {your program} | wireshark -k -i -

on the machine on which you're running Wireshark, or

{your program} | ssh {the machine on which to run Wireshark} wireshark -i -i -

on the Linux machine?

If so, what operating system is running on the machine on which Wireshark is be run?

I am wondering how tcpdump is handling same thing.

pcap_dump_open() and pcap_dump(), both of which are libpcap routines.

tejaskumark commented 4 years ago

@guyharris

Does "I create pcap file by dumping information capturing through raw socket" mean that your program reads captured packets from a raw socket and writes those packets to a file, where the file is opened by something such as

FILE *fp = fopen(capture_file_path, "w");
and then fp is used in the

write(fileno(fp), &pcapfh, 24);
write(fileno(fp), &pcaphdr, sizeof(pcaphdr));
write(fileno(fp), pkt_ptr, bytes_to_write);
code?

Yes exactly. Then I can open that file in wireshark without any issue.

Then you say

But now I am planning to do it same as "tcpdump -i eth0 -w - | wireshark -k -i -" does, simply write to stdout and then I can pipe it to remote machine wireshark so that I can have live capture over there.

Does that mean that your new program reads captured packets from a raw socket and writes those packets to the standard output, with the

write(fileno(stdout), &pcapfh, 24);
write(fileno(stdout), &pcaphdr, sizeof(pcaphdr));
write(fileno(stdout), pkt_ptr, bytes_to_write);
code, and you arrange that your code runs on the Linux machine on which you're doing the capture, and that program's standard output gets sent over a network connection to a remote machine on which Wireshark is running, and Wireshark reads its standard input?

Yes, it is getting sent over network connection to remote machine Ubuntu. So both remote and local machines are linux.

If so, how are you setting up that network connection? Does it use, for example, ssh, so that you do

ssh {the Linux machine} {your program} | wireshark -k -i -
on the machine on which you're running Wireshark, or

{your program} | ssh {the machine on which to run Wireshark} wireshark -i -i -

on the Linux machine?

Yes, You are right. For example, with tcpdump I can use below cli, sshpass -p 'pass' ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null root@remote host '/sbin/tcpdump -i eth0 -w -' | wireshark -k -i - For my application I do instead below, sshpass -p 'pass' ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null root@remote host '/sbin/sniffer eth0 ' | wireshark -k -i -

If so, what operating system is running on the machine on which Wireshark is be run?```
Both remote and local systems are linux.

pcap_dump_open() and pcap_dump(), both of which are libpcap routines.

I will check this out.

guyharris commented 4 years ago

For my application I do instead below, sshpass -p 'pass' ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null root@remote host '/sbin/sniffer eth0 ' | wireshark -k -i -

So what happens if you run

/sbin/sniffer eth0 >/tmp/test.pcap

and then try to read /tmp/test.pcap with Wireshark?

tejaskumark commented 4 years ago

@guyharris That also comes out corrupt file, with below error on wireshark

The capture file appears to be damaged or corrupt.
(pcap: File has 1107296256-byte packet, bigger than maximum of 262144)

tejaskumark commented 4 years ago

@guyharris Looks like, it was my stupid mistake only at the end. What I was doing all along was only thing needed. My snaplen was getting set wrong, and that is why whole problem got created. FYI, We also need to set setvbuf(f, NULL, _IONBF, 0);, so that stdout buffering is not used and streaming of packets remain constant. Otherwise after some packet captures stream will break due to packet size bigger than expected, as buffer will be more that expected to wireshark. Thanks for your all help. I appreciate very much.

guyharris commented 4 years ago

The capture file appears to be damaged or corrupt. (pcap: File has 1107296256-byte packet, bigger than maximum of 262144)

That's a VERY large packet. Did you really capture a packet that large?

guyharris commented 4 years ago

The capture file appears to be damaged or corrupt. (pcap: File has 1107296256-byte packet, bigger than maximum of 262144)

That's a VERY large packet. Did you really capture a packet that large?

Or is the problem simply that the byte order of the magic number in pcapfh is not the byte order of the host that's running sniffer?

My current draft specification for pcap format says:

All fields in the File Header and in Packet Records will always be saved according to the characteristics (little endian / big endian) of the capturing machine. This refers to all the fields that are saved as numbers and that span over two or more octets.

and

Magic Number (32 bits): an unsigned magic number, whose value is either the hexadecimal number 0xA1B2C3D4 or the hexadecimal number 0xA1B23C4D. If the value is 0xA1B2C3D4, time stamps in Packet Records (see Figure 2) are in seconds and microseconds; if it is 0xA1B23C4D, time stamps in Packet Records are in seconds and nanoseconds. These numbers can be used to distinguish sections that have been saved on little-endian machines from the ones saved on big-endian machines, and to heuristically identify pcap files.

1107296256 is 0x42000000; if you byte-swap that, it's 0x00000042, which is 66.

Where is the code that initializes the pcapfh structure and the pcaphdr structure?

guyharris commented 4 years ago

FYI, We also need to set setvbuf(f, NULL, _IONBF, 0);, so that stdout buffering is not used and streaming of packets remain constant. Otherwise after some packet captures stream will break due to packet size bigger than expected, as buffer will be more that expected to wireshark.

1) Note that neither tcpdump nor libpcap turn buffering off for -w -, so tcpdump is buffering its output in tcpdump -i eth0 -w - | wireshark -k -i -.

2) The "buffer" in "stdout buffering" is not "more than expected to wireshark" - Wireshark has no expectations at all about how much data is read from the pipe at any given time. "File has 1107296256-byte packet, bigger than maximum of 262144" doesn't mean "I just read 1107296256 bytes from the pipe, and that's bigger than 262144, which is the maximum number of bytes I'm prepared to read from the pipe"; it means "the Captured Packet Length field in the per-packet header is larger than the absolute limit that Wireshark imposes on packet lengths". Note that "the absolute limit that Wireshark imposes on packet lengths" is NOT the SnapLen field in the file header, it's a wired-in limit.

tejaskumark commented 4 years ago

That's a VERY large packet. Did you really capture a packet that large?

No. That is because I gave wrong saplen earlier, and that messed up headers and because of that large packet size we can see. So we can ignore this error, as it was at my end.

Where is the code that initializes the pcapfh structure and the pcaphdr structure?

pcapfh.magic = htonl(0xA1B2C3D4); pcapfh.version_major = htons(0x0002); pcapfh.version_minor = htons(0x0004); pcapfh.thiszone = htonl(0); pcapfh.sigfigs = htonl(0); pcapfh.snaplen = htonl(0); // This value is 0. pcapfh.linktype = htonl(1);

gettimeofday(&tv, NULL); pcaphdr.ts.tv_sec = htonl(tv.tv_sec); pcaphdr.ts.tv_usec = htonl(tv.tv_usec); bytes_to_write = rlen; pcaphdr.caplen = pcaphdr.len = bytes_to_write;

To debug it further, I just switched off packets output and just logged all sizes bytes_to_write and rlen without any additional changes to code, and it never crosses 1518 eth frame size. So I think we are good on that front. So we can rule out possibility of file header or pcap header getting set to bigger values.

fwrite experiment If I am not mistaken, then libpcap dump pcap is using fwrite and I was using write. So I replaced my write with fwrite without buffer clear setvbuf(f, NULL, _IONBF, 0);. and streaming works as expected. So I guess our answer lies with write vs fwrite.

fwrite and write(with setvbuf(f, NULL, _IONBF, 0) experiment ex. sniffer eth0 > /tmp/test.pcap File opened with wireshark will have packets, but it will also give warning like same as above(pcap: File has 1667982708-byte packet, bigger than maximum of 262144). Here again I captured all the pcap header lengths to other file, and it says that not a single size was exceeding 1500.

guyharris commented 4 years ago

Where is the code that initializes the pcapfh structure and the pcaphdr structure?



pcapfh.magic = htonl(0xA1B2C3D4);

pcapfh.version_major = htons(0x0002);

pcapfh.version_minor = htons(0x0004);

pcapfh.thiszone = htonl(0);

pcapfh.sigfigs = htonl(0);

pcapfh.snaplen = htonl(0); // This value is 0.

pcapfh.linktype = htonl(1);

That is incorrect.

The byte order of fields in a pcap file header is the byte order of the host that's writing the file; unless your program is running on a big-endian machine (big-endian PowerPC/Power ISA, System/390, z/Architecture, SPARC, most MIPS systems, etc.), putting them in big-endian (network) byte order is incorrect, so if, for example, you're running on a little-endian machine (x86, ARM, little-endian PowerPC/Power ISA, etc.), that code will produce an invalid file.

I.e., the htonl() calls will work only on systems where htonl() doesn't do anything; if you ever expect to run the code on, for example, an "IBM-compatible PC" or any other x86-based machine, htonl() will change the byte order, so the code won't work. And if you only ever expect to run on a big-endian machine, the htonl() won't do anything, so you can leave it out - and, if the code ever ends up running on a little-endian machine, it will work correctly if you remove the htonl(), so you should do it.

So remove all the htonl() calls.

gettimeofday(&tv, NULL);

pcaphdr.ts.tv_sec = htonl(tv.tv_sec);

pcaphdr.ts.tv_usec = htonl(tv.tv_usec);

The same is true there - remove all the htonl() calls.

bytes_to_write = rlen;

pcaphdr.caplen = pcaphdr.len = bytes_to_write;

That, however, is correct, as it's not using htonl().

So remove all the htonl() calls. That will fix your problem.

tejaskumark commented 4 years ago

Actually my system is MIPS only.

guyharris commented 4 years ago

Actually my system is MIPS only.

So "your system" means "all of the computers you have", and both the machine running sniffer and the machine running Wireshark are big-endian MIPS machines? (Yes, there have been little-endian MIPS machines, such as DECstations.)

If the machine running Wireshark is, for example, a PC (meaning an x86-based PC), then that pipe will not work, for the reasons I described.

So, again, remove the htonl() calls; they are incorrect.

tejaskumark commented 4 years ago

So I removed all htonl calls from headers definition as requested by you and with all lengths logged into other file for analysis.

Streaming still breaks after very short time, and from length logs I can say that all packets lengths were within range.
Then I tried to take pcap by just redirecting stdout to local file, and tried to open file with wireshark and file also gives error with pcap: File has 1952803683-byte packet, bigger than maximum of 262144. But as per my counters log, I can say for sure that all packets were present in file, but still wireshark gives error when pcap opened.

guyharris commented 4 years ago

So what is the complete source code to your sniffer program?

tejaskumark commented 4 years ago

Sorry for late reply, got stuck with other matters. It was my code that was terminating session on file size check. So you are right we do not need buffer management at all. I really appreciate your help and thank you for all your help.

the-tcpdump-group / tcpdump

Need help with "-w -" option to understand #865