Open pstavirs opened 1 year ago
vlan does not work as you would expect, alas. There are multiple open issues about this, but so far, no code to solve the problem.
The latest snapshots of tcpdump and libpcap on FreeBSD/AArch64 produce the following bytecode:
$ ./tcpdump -y EN10MB -d '(ether[len - 4:4] == 0x1d10c0da) and not (icmp or (vlan and icmp))'
(000) ld #pktlen
(001) sub #4
(002) tax
(003) ld [x + 0]
(004) jeq #0x1d10c0da jt 5 jf 17
(005) ldh [12]
(006) jeq #0x800 jt 7 jf 9
(007) ldb [23]
(008) jeq #0x1 jt 17 jf 16
(009) jeq #0x8100 jt 12 jf 10
(010) jeq #0x88a8 jt 12 jf 11
(011) jeq #0x9100 jt 12 jf 16
(012) ldh [16]
(013) jeq #0x800 jt 14 jf 16
(014) ldb [27]
(015) jeq #0x1 jt 17 jf 16
(016) ret #262144
(017) ret #0
$ ./tcpdump -O -y EN10MB -d '(ether[len - 4:4] == 0x1d10c0da) and not (icmp or (vlan and icmp))'
(000) ld #pktlen
(001) st M[0]
(002) ld #0x4
(003) st M[1]
(004) ldx M[1]
(005) ld M[0]
(006) sub x
(007) st M[1]
(008) ldx M[1]
(009) ld [x + 0]
(010) st M[2]
(011) ld #0x1d10c0da
(012) st M[3]
(013) ldx M[3]
(014) ld M[2]
(015) sub x
(016) jeq #0x0 jt 17 jf 32
(017) ldh [12]
(018) jeq #0x800 jt 19 jf 21
(019) ldb [23]
(020) jeq #0x1 jt 32 jf 21
(021) ldh [12]
(022) jeq #0x8100 jt 27 jf 23
(023) ldh [12]
(024) jeq #0x88a8 jt 27 jf 25
(025) ldh [12]
(026) jeq #0x9100 jt 27 jf 31
(027) ldh [16]
(028) jeq #0x800 jt 29 jf 31
(029) ldb [27]
(030) jeq #0x1 jt 32 jf 31
(031) ret #262144
(032) ret #0
The simplest way to tell whether the problem is Linux-specific and/or optimizer-specific would be using a packet that is definitely expected to match the filter. Please provide a small reproducer .pcap
file if you can. Also please clarify whether the problem reproduces only on a live capture or only when reading from a file or both.
BPF bugs tend to take a long time to resolve, so it would be better to spell the steps to reproduce the problem that can be used much later if required.
@infrastation
There are two paths through tcpdump
depending on if an interface is selected.
If you add a -i
option does it product wonky code like below?
root@ubuntu2204:/usr/bin# tcpdump -y EN10MB -d '(ether[len - 4:4] == 0x1d10c0da) and not (icmp or (vlan and icmp))' -i 1
tcpdump: data link type EN10MB
(000) ld #0x0
(001) st M[4]
(002) ld #pktlen
(003) sub #4
(004) tax
(005) ld [x + 0]
(006) st M[2]
(007) ld #0x1d10c0da
(008) st M[3]
(009) ld M[2]
(010) jeq #0x1d10c0da jt 11 jf 32
(011) ldh [12]
(012) jeq #0x800 jt 13 jf 15
(013) ldb [23]
(014) jeq #0x1 jt 32 jf 15
(015) ldb [vlanp]
(016) jeq #0x1 jt 25 jf 17
(017) ld #0x1d10c0de
(018) st M[3]
(019) ld #0x4
(020) st M[4]
(021) ldh [12]
(022) jeq #0x8100 jt 25 jf 23
(023) jeq #0x88a8 jt 25 jf 24
(024) jeq #0x9100 jt 25 jf 31
(025) ldx M[4]
(026) ldh [x + 12]
(027) jeq #0x800 jt 28 jf 31
(028) ldx M[3]
(029) ldb [x + 23]
(030) jeq #0x1 jt 32 jf 31
(031) ret #262144
(032) ret #0
@infrastation - I have updated the issue with a repoducer pcap file and instructions.
Please note that the unoptimized bpf instructions are also incorrect. So theproblem is in code generation (very likely Linux specific) and not the optimizer.
srivatsp@albus:~$ sudo tcpdump -d -i enp0s9 --no-optimize "(ether[len - 4:4] == 0x1d10c0da) and not (icmp or (vlan and icmp))"
(000) ld #0x0
(001) st M[3]
(002) st M[4]
(003) ld #pktlen
(004) st M[0]
(005) ld #0x4
(006) st M[1]
(007) ldx M[1]
(008) ld M[0]
(009) sub x
(010) st M[1]
(011) ldx M[1]
(012) ld [x + 0]
(013) st M[2]
(014) ld #0x1d10c0da
(015) st M[3]
(016) ldx M[3]
(017) ld M[2]
(018) sub x
(019) jeq #0x0 jt 20 jf 45
(020) ldh [12]
(021) jeq #0x800 jt 22 jf 24
(022) ldb [23]
(023) jeq #0x1 jt 45 jf 24
(024) ldb [vlanp]
(025) jeq #0x1 jt 38 jf 26
(026) ld M[3]
(027) add #4
(028) st M[3]
(029) ld M[4]
(030) add #4
(031) st M[4]
(032) ldh [12]
(033) jeq #0x8100 jt 38 jf 34
(034) ldh [12]
(035) jeq #0x88a8 jt 38 jf 36
(036) ldh [12]
(037) jeq #0x9100 jt 38 jf 44
(038) ldx M[4]
(039) ldh [x + 12]
(040) jeq #0x800 jt 41 jf 44
(041) ldx M[3]
(042) ldb [x + 23]
(043) jeq #0x1 jt 45 jf 44
(044) ret #262144
(045) ret #0
See (026) - I think it presumes M[3]
is 0 (set during init) when issuing the load, not realizing that (015) stores a different value to M[3]
. Is the compiler incorrectly allocating M[3]
for use in gen_vlan_vloffset_add()
?
This does not seem to be the usual "offsets after the vlan
keyword are wrong" case, but #916 indeed looks suspiciously similar.
The working optimized bytecode dumpcap
displays on Windows is the same optimized bytecode tcpdump
displays on Linux and FreeBSD when reading from a file. In the latter two cases the filter in question matches 5 packets in the attached sample file, which is the expected result.
When capturing on a live interface on Linux, it is expected that the bytecode will be different because it has to accommodate Linux specifics of VLAN handling as explained in more detail on this page. In other words, the fact it is different on itself does not mean it is wrong. But if it is wrong, a good next step would be to tell exactly what is wrong. It might be helpful to minimize the filter expression.
Here's a sample PCAP file with the packets that should match the filter.
tshark states these packets are invalid:
802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 101
Internet Protocol Version 4, Src: 198.18.4.21, Dst: 198.18.4.101
IPv6 Hop-by-Hop Option
[Malformed Packet: IPv6 Hop-by-Hop]
Here's a sample PCAP file with the packets that should match the filter.
tshark states these packets are invalid:
802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 101 Internet Protocol Version 4, Src: 198.18.4.21, Dst: 198.18.4.101 IPv6 Hop-by-Hop Option [Malformed Packet: IPv6 Hop-by-Hop]
Bogus protocol field in the IP header?
Protocol: IPv6 Hop-by-Hop Option (0)
The vlan
header and trailer byte pattern look ok:
Type: 802.1Q Virtual LAN (0x8100)
05e0 00 00 03 63 f9 61 1d 10 c0 da ...c.a....
The packets are UDP with VLAN and have the pattern
0x1d10c0da
at the end [...]
I don't see that in the pcap sample.
@bubbasnmp @fxlb These are crafted packets to reproduce the problem. Irrespective of them being invalid packets (in the higher layers), the capture filter should still match the packets.
If valid packets are really required, I can provide a sample pcap tomorrow
The pcap should, at least, match the issue. If UDP is significant, there should be UDP in some packets.
The original description is indeed incorrect, but this makes no difference in the way of reproducing the issue because the packets are VLAN and are not ICMP and have the trailer, which should match the filter if everything works well. When tcpdump is filtering a savefile as shown above, the packets appear in the output. When tcpdump is capturing on an Ethernet interface in Linux without a filter and tcpreplay is replaying the savefile on the same interface, the packets appear in tcpdump output. When in addition to that the filter is set to the above expression, the packets do not appear:
0 packets captured
0 packets received by filter
0 packets dropped by kernel
So with the provided file the problem reproduces as described.
tcpdump is not able to capture packets with the below filter -
The packets are UDP with VLAN and have the pattern
0x1d10c0da
at the end which should match the above capture filter, but they don't.To investigate, I used
tcpdump -d
with the above filterAll seems ok till we come post the
vlanp
(vlan present) check.If I'm reading the instructions correctly, I think the problem is (017), (018) which stores
0x1d10c0de
intoM[3]
which is accessed by (028), (029).Instruction (028) seems incorrect to me as (029) expects
x
to be 4 similar to (026).Trying
--no-optimize
has a similar error in the unoptimized code.The problem I think is Linux specific, since Wireshark's dumpcap on Windows which also uses
pcap_compile()
seems to generate the correct BPF instructions -Version info
Is this another weird side effect of the kludgy vlan matching implementation?
Reversing the expression with the vlan check at the beginning works fine.
Originally reported on ask.wireshark.org
Here's a sample PCAP file with the packets that should match the filter. However, note that applying the filter while reading from the file is successful unlike like live capture. So you will need to use the sample pcap with tcpreplay on a live interface to reproduce the problem
sample.zip