the-tcpdump-group / libpcap

the LIBpcap interface to various kernel packet capture mechanism
https://www.tcpdump.org/
Other
2.71k stars 852 forks source link

Linux cooked capture (“tcpdump -i any”) mode shows protocol filed as ‘Ethernet (0x0003)’ instead of ‘IPv4 (0x0800)’ in DHCP OFFER and ACK packets #739

Closed yaswanth-y closed 2 years ago

yaswanth-y commented 6 years ago

When doing tcpdump with interface any (“tcpdump -i any”), DHCP OFFER and ACK packets are not captured properly.

I am running DHCP Server and DHCP Client in the same network. The DHCP functionality is working fine and Client got IP with DORA process. I captured the DORA packets at DHCP server by tcpdump with interface any "tcpdump -i any -w traffic_manual.cap". I opened traffic_manuial.cap via Wireshark and seeing that DHCP OFFER and ACK packets are not captured properly.

In ‘linux cooked capture’ of DHCP OFFER and ACK packets, the ‘protocol’ filed showing as “Ethernet (0x0003)”, but it should be “IPv4 (0x0800)” . The data in the next layers of ‘linux cooked capture’ is correct i.e. IP,UDP,DHCP OFFER. Because of ‘Ethernet’ type in protocol filed in ‘linux cooked capture’, the wireshark analysing the next layer as ‘Ethernet header’ and proceeding further wrongly. And showing as unknown data at later. Hence not able to analyse the DHCP OFFER and ACK packets.

Please find the attached dhcp offer packet capture screen shot.

Reported version: tcpdump version 4.9.0 libpcap version 1.7.2

dhcp_offer_packet
mcr commented 6 years ago

tcpdump -i any

oes not open the network in promisc mode. If you need that, you need to open the specific devices. I believe that this is documented in the tcpdump(1) man page under -i:

          On  Linux  systems with 2.2 or later kernels, an interface argument of ``any'' can be
          used to capture packets from all interfaces.   Note  that  captures  on  the  ``any''
          device will not be done in promiscuous mode.
infrastation commented 6 years ago

From the description of LINKTYPE_LINUX_SLL I could not tell quickly whether the reported difference is a bug. Whatever is the resolution of this issue, it would make sense to add a couple sentences to that specification to clarify the meaning of the "protocol type" field.

guyharris commented 6 years ago

The DHCP OFFER packet is being sent by the machine running the capture. If it's not being sent using the regular networking stack (perhaps because the target machine doesn't yet have an IP address, so the regular networking stack can't send to it), but is instead being sent by writing to a PF_PACKET socket, it might be that looping back packets sent on a PF_PACKET socket to the PF_PACKET socket on which the capture is being done isn't resulting in the right protocol being provided to the program running libpcap.

guyharris commented 6 years ago

So what kernel version are you using, and what DHCP server program is being used?

There may be a kernel bug, which we may or may not be able to work around.

guyharris commented 6 years ago

And what's worse is that there are also packets, captured on the "any" interface, that have a protocol type of ETH_P_ALL (0x0003) and that do have Ethernet headers! See, for example, Wireshark bug 5422.

So maybe different Linux kernels behave differently; the capture attached to the Wireshark bug was made with a 2.6.32.21 kernel.

guyharris commented 6 years ago

As for the "protocol" field, its value is "whatever the kernel provides", so if the protocol field is wrong, that's a kernel bug. The Linux packet(7) man page says:

sll_protocol is the standard ethernet protocol type in network byte order as defined in the <linux/if_ether.h> include file. It defaults to the socket's protocol.

I'm not sure under what circumstances the protocol type has to "default to" the socket's protocol, but if the packets being sent by the DHCP server are packets where the protocol type needs to "default to" the socket's protocol, that means they'd "default to" ETH_P_ALL, as that's the socket type used if you want to sniff traffic, so, yes, sll_protocol becomes 0x0003, and sll_protocol is what's used for the "protocol" field in the cooked header.

guyharris commented 6 years ago

I checked in a change to the website to indicate that the "protocol type" field is "the Ethernet protocol type for the packet".

infrastation commented 6 years ago

Is it OK to close this case?

guyharris commented 6 years ago

Is it OK to close this case?

I'd still like to know why the 2.6.32.21 kernel in the Wireshark bug included the Ethernet header in the packet but the kernel in this case didn't, in case there's something that can be done to get packets (presumably) injected by the DHCP server are in a form that can be dissected by programs such as tcpdump and Wireshark.

infrastation commented 6 years ago

@yaswanth-y, could you provide more detail about the setup, at least the OS name and version?

yaswanth-y commented 6 years ago

Thank you so much for the update.

Reported kernel version and OS: 2.6.32.26-175.fc12.x86_64

What would be the possible fix, if it is a kernel issue.

freedge commented 4 years ago

I was hitting this issue as well for ICMP requests going out of one of my interface

Linux amn 4.4.0-171-generic #200-Ubuntu SMP Tue Dec 3 11:04:55 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
tcpdump version 4.9.2
libpcap version 1.7.4

however I tried again with more recent packages:

Linux ubuntu-eoan 5.3.0-26-generic #28-Ubuntu SMP Wed Dec 18 05:37:46 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
tcpdump version 4.9.3
libpcap version 1.9.1 (with TPACKET_V3)

and it works just fine for my ICMP request packet.

I do have have some STP packets that show up as Unknown (0x0300) and I believe this is due to the fact that STP packets are recognized based on the destination mac address (01:80:C2:00:00:00) that is missing from the the SLL format.

guyharris commented 4 years ago

I was hitting this issue as well for ICMP requests going out of one of my interface

Linux amn 4.4.0-171-generic #200-Ubuntu SMP Tue Dec 3 11:04:55 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
tcpdump version 4.9.2
libpcap version 1.7.4

however I tried again with more recent packages:

Linux ubuntu-eoan 5.3.0-26-generic #28-Ubuntu SMP Wed Dec 18 05:37:46 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
tcpdump version 4.9.3
libpcap version 1.9.1 (with TPACKET_V3)

and it works just fine for my ICMP request packet.

As noted above:

  1. no changes were made to libpcap for this, so there is no reason to expect that changing the version of libpcap will make any difference whatsoever;

  2. this appears to be a Linux kernel issue, and those two captures were done with different Linux kernel versions, so that might account for the change.

(This isn't a tcpdump issue, so there is no reason to expect that changing the version of tcpdump will make any difference.)

I do have have some STP packets that show up as Unknown (0x0300) and I believe this is due to the fact that STP packets are recognized based on the destination mac address (01:80:C2:00:00:00) that is missing from the the SLL format.

That's probably due to the Linux kernel doing something strange; tcpdump and Wireshark recognize STP packets based not on the destination MAC address but on:

  1. the type/length field for packets with an Ethernet header and the "protocol type" field for packets with an SLL header - that indicates whether the packet has an Ethernet type or an 802.2 LLC header;

  2. the 802.2 LLC header's DSAP, SSAP field - that indicates whether the packet is an STP packet or a SNAP packet;

  3. if it's a SNAP packet, the SNAP header's OUI and PID.

freedge commented 4 years ago

thanks for your quick reply!

for reference this is what I see: image

and indeed if I change the protocol from 0x0300 into 0x0004 ("frame begins with an 802.2 LLC header") it works just fine.

guyharris commented 4 years ago

So, for your capture, somewhere something in the kernel has decided to set either the protocol field of a struct sk_buff or the sll_protocol field of a struct sockaddr_ll to 0x0300 rather than to ETH_P_802_2 for an Ethernet frame with a length in the type/length field and with an 802.2 header, and provide the Ethernet payload, starting with the 802.2 header, to a PF_PACKET/SOCK_DGRAM socket.

For the initial capture for this bug, somewhere something in the kernel has decided to set either the protocol field of a struct sk_buff or the sll_protocol field of a struct sockaddr_ll to 0x0003 rather than to the type field of the Ethernet packet for an Ethernet frame with a type in the type/length field, and provide the Ethernet payload, starting with the 802.2 header, to a PF_PACKET/SOCK_DGRAM socket.

And for the capture in Wireshark bug 5422, somewhere something in the kernel has decided to set either the protocol field of a struct sk_buff or the sll_protocol field of a struct sockaddr_ll to 0x0003 (ETH_P_ALL), and provide a raw Ethernet frame, complete with Ethernet header, to a PF_PACKET/SOCK_DGRAM socket (rather than stripping off the Ethernet header and providing only the payload, as is normally done for packets delivered to a socket of that type).

So we have three different botches on the part of the kernel.

freedge commented 4 years ago

original problem should have gone after https://github.com/torvalds/linux/commit/18bed89107a400af0d672ec85a270f1545db2569#diff-e936270d5d8fefaf66cb5ded38a638fc

I will try to check for the case of STP

guyharris commented 4 years ago

So https://github.com/torvalds/linux/commit/75c65772c3d18447d62d3aca5f91b06c16cc25e4, which appears to have been checked in for a 5.x kernel release, changed the code path for sending packets via PF_PACKET/SOCK_RAW sockets. This appears to have been part of a number of changes submitted by Maxim Mikityanskiy of Mellanox to fix problems caused by sending packets to adapters supported by the mlx5 driver via such a socket. The discussion started with this message. The fix was proposed in this message, and the patch set begins with this message; the problematic change ws in this message.

The fix you mention was submitted in this message. A followup 1) notes that the change mentioned in the previous paragraph was first in v5.1-rc1 (so that particular change wasn't in any pre-5.1 kernel) and 2) says that "Calling sendmsg on a packet socket bound to ETH_P_ALL goes back a long way. It is a user, not kernel, bug to do so."; a subsequent followup says that "AFAIK it can/could be done correctly via specifying the protocol in the sll_protocol field of the struct sockaddr_ll passed to sendmsg as the target address.", so there may have been different misbehavior in pre-5.1 kernels when packets were being injected via a PF_PACKET/SOCK_RAW socket.

So it might be that, as of the version of the kernel with Yoshiki Komachi's fix, none of these problems exist. It might also be possible for libpcap to, based on the kernel version, attempt to work around the kernel and userland-packet-injection issues.

freedge commented 4 years ago

After Yoshiki Komachi's fix, packets sent using sendmsg with proto=ETH_P_ALL (0x0003) will be analysed so that the protocol found within the ethernet protocol field is used instead.

Actually my issue is a bit different, this is what the program sends:

sendmsg(19, {msg_name(20)={sa_family=AF_UNSPEC, sa_data="\x03\x00\x07\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"}, msg_iov(1)=[{"\x01\x80\xc2\x00\x00\x00\x80\x72\xbe\x61\xf5\xd3\x00\x69\x42\x42\x03\x00\x00\x03\x02\x7c\x80\x00\x40\x11\x0d\x6b\x94\x37\x00\x00"..., 119}], msg_controllen=0, msg_flags=0}, MSG_DONTWAIT) = 119

the protocol should be the 2 first bytes of sa_data, so 0x0300 instead of 0x0004 (ETH_P_802_2), and by the way not triggering the packet_parse_headers logic (that is only for 0x0003).

packet_snd parses this data a sockaddr_ll and deduces a protocol of 0x0300. Note that the packet sent on the network is perfectly correct and received properly as an ETH_P_802_2 frame.

One possible fix (on kernel side) could be to write something like this beginning of packet_parse_headers:

        if (unlikely(is_link_local_ether_addr(skb->data))) {
                skb_reset_mac_header(skb);
                skb->protocol = htons(ETH_P_802_2);
        }

that would discard whatever protocol the user provided.

In any case, I believe the original issue with the user client calling sendmsg with proto=ETH_P_ALL is worked around in newer kernels, so you could close this ticket.

For my own issue, I don't see how a workaround could be done in libpcap as it would need the destination mac address to reliably determine the protocol, and we don't have it at this point.

Thanks for your comments!

infrastation commented 3 years ago

So, a part of the problem was a Linux kernel bug. Is there anything else remaining to be done in libpcap to make this issue reasonably addressed (adding a paragraph of text to the documentation and/or adding a workaround to the code)?

infrastation commented 2 years ago

Alright, closing as a Linux kernel bug then.