the-tcpdump-group / libpcap

the LIBpcap interface to various kernel packet capture mechanism
https://www.tcpdump.org/
Other
2.7k stars 852 forks source link

Can't filter vlan and geneve packet #1279

Open gujun4990 opened 8 months ago

gujun4990 commented 8 months ago

We want to capture the packets including vlan 1264 and geneve. for example:

16:08:46.074231 16:c5:84:65:a4:41 > ba:b3:1d:11:c8:43, ethertype 802.1Q (0x8100), length 160: vlan 1264, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 37785, offset 0, flags [DF], proto UDP (17), length 142)
    46.168.20.3.19312 > 46.168.20.5.6081: [udp sum ok] Geneve, Flags [C], vni 0x13, proto TEB (0x6558), options [class Open Virtual Networking (OVN) (0x102) type 0x80(C) len 8 data 00030002]
    fa:16:3e:4a:1c:dc > 3e:a5:be:e6:2a:4c, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 63, id 39488, offset 0, flags [DF], proto ICMP (1), length 84)
    192.168.101.92 > 172.46.0.1: ICMP echo request, id 1531, seq 18463, length 64
16:08:46.074470 ba:b3:1d:11:c8:43 > 16:c5:84:65:a4:41, ethertype 802.1Q (0x8100), length 160: vlan 1264, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 26501, offset 0, flags [DF], proto UDP (17), length 142)
    46.168.20.5.30750 > 46.168.20.3.6081: [udp sum ok] Geneve, Flags [C], vni 0x12, proto TEB (0x6558), options [class Open Virtual Networking (OVN) (0x102) type 0x80(C) len 8 data 00020003]
    fa:16:3e:23:13:44 > fa:16:3e:06:b1:cd, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 63, id 28164, offset 0, flags [none], proto ICMP (1), length 84)
    172.46.0.1 > 192.168.101.92: ICMP echo reply, id 1531, seq 18463, length 64

We filter the packets using ovs-tcpdump -i Bond1 -nnevv "(vlan 1264) and geneve", but it is failed. I check the related document. From the document's description, I think the above command isn't supported. So I want to know whether there are some filters expressions to support the situation.

infrastation commented 8 months ago
   ovs-tcpdump creates switch mirror ports in the ovs-vswitchd
   daemon and executes tcpdump to listen against those ports. When
   the tcpdump instance exits, it then cleans up the mirror port it
   created.

Does ovs-tcpdump fail in the sense it prints an error message and fails to run, or in the sense it runs, but fails to capture the expected packets? What does tcpdump --version print? Is Bond1 the correct OVS interface?

gujun4990 commented 8 months ago

The error is below:

[root@node-2 ~]# ovs-tcpdump -i Bond1 -nnevv "(vlan 1264) and geneve"
dropped privs to tcpdump
Warning: Kernel filter failed: Invalid argument
tcpdump: listening on miBond1, link-type EN10MB (Ethernet), capture size 262144 bytes

Seems that filter isn't supported by bpf. The tcpdump's version is below:

[root@node-2 ~]# tcpdump --version
tcpdump version 4.9.3
libpcap version 1.9.1 (with TPACKET_V3)
OpenSSL 1.1.1g FIPS  21 Apr 2020

In addition, I run ovs-tcpdump -i Bond1 -Od "(vlan 1264) and geneve" command:

[root@node-2 ~]# ovs-tcpdump -i Bond1 -Od "(vlan 1264) and geneve"
(000) ld       #0x0
(001) st       M[4]
(002) st       M[2]
(003) ldb      [-4048]
(004) jeq      #0x1             jt 17   jf 5
(005) ld       M[0]
(006) add      #4
(007) st       M[0]
(008) ld       M[1]
(009) add      #4
(010) st       M[1]
(011) ldh      [12]
(012) jeq      #0x8100          jt 17   jf 13
(013) ldh      [12]
(014) jeq      #0x88a8          jt 17   jf 15
(015) ldh      [12]
(016) jeq      #0x9100          jt 17   jf 98
(017) ldb      [-4048]
(018) jeq      #0x1             jt 19   jf 21
(019) ldb      [-4052]
(020) ja       22
(021) ldh      [14]
(022) and      #0xfff
(023) jeq      #0x4f0           jt 24   jf 98
(024) ldx      M[1]
(025) ldh      [x + 12]
(026) jeq      #0x800           jt 27   jf 58
(027) ldx      M[0]
(028) ldb      [x + 23]
(029) jeq      #0x11            jt 30   jf 58
(030) ldx      M[0]
(031) ldh      [x + 20]
(032) jset     #0x1fff          jt 58   jf 33
(033) ldx      M[0]
(034) ldb      [x + 14]
(035) and      #0xf
(036) lsh      #2
(037) add      x
(038) tax      
(039) ldh      [x + 16]
(040) jeq      #0x17c1          jt 41   jf 58
(041) ldx      M[0]
(042) ldb      [x + 14]
(043) and      #0xf
(044) lsh      #2
(045) add      x
(046) tax      
(047) ldb      [x + 22]
(048) and      #0xc0
(049) jeq      #0x0             jt 50   jf 58
(050) ldx      M[0]
(051) ldb      [x + 14]
(052) and      #0xf
(053) lsh      #2
(054) add      x
(055) tax      
(056) txa      
(057) jeq      x                jt 76   jf 58
(058) ldx      M[1]
(059) ldh      [x + 12]
(060) jeq      #0x86dd          jt 61   jf 98
(061) ldx      M[0]
(062) ldb      [x + 20]
(063) jeq      #0x11            jt 64   jf 98
(064) ldx      M[0]
(065) ldh      [x + 56]
(066) jeq      #0x17c1          jt 67   jf 98
(067) ldx      M[0]
(068) ldb      [x + 62]
(069) and      #0xc0
(070) jeq      #0x0             jt 71   jf 98
(071) ldx      M[0]
(072) ld       #0x28
(073) add      x
(074) tax      
(075) jeq      x                jt 76   jf 98
(076) add      #22
(077) tax      
(078) add      #2
(079) st       M[2]
(080) ldb      [x + 0]
(081) and      #0x3f
(082) mul      #4
(083) add      #8
(084) add      x
(085) st       M[3]
(086) ldh      [x + 2]
(087) ldx      M[3]
(088) jeq      #0x6558          jt 89   jf 94
(089) txa      
(090) add      #12
(091) st       M[2]
(092) add      #2
(093) tax      
(094) stx      M[4]
(095) ld       #0x0
(096) jeq      #0x0             jt 97   jf 98
(097) ret      #262144
(098) ret      #0

The M[0] and M[1] are not initialized to 0?

infrastation commented 8 months ago

I understand the problem is that the filter does not match any packets. If you run the same command without the filter, does it capture any packets? Does it capture the packets you are looking for? Does the problem reproduce with the latest stable versions of tcpdump and libpcap?

gujun4990 commented 8 months ago

I run ovs-tcpdump -i Bond1 -nnevv command and can capture all packets. And run the command ovs-tcpdump -i Bond1 -nnevv 'vlan 1264' is also correct. I want to filter the below packets: 企业微信截图_20240220174926 I check another tcpdump version and have the same problem:

root@work:~# tcpdump -i ens3 -nnevv "(vlan 100) and geneve"
Warning: Kernel filter failed: Invalid argument
tcpdump: listening on ens3, link-type EN10MB (Ethernet), snapshot length 262144 bytes

^C
0 packets captured
17 packets received by filter
0 packets dropped by kernel
root@work:~# 
root@work:~# tcpdump --version
tcpdump version 4.99.1
libpcap version 1.10.1 (with TPACKET_V3)
OpenSSL 3.0.2 15 Mar 2022
root@work:~# uname -a
Linux work 5.15.0-92-generic #102-Ubuntu SMP Wed Jan 10 09:33:48 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
infrastation commented 8 months ago

If you use udp port 6081 instead of geneve in the filter, does it match the packets you expect?

gujun4990 commented 8 months ago

I use ovs-tcpdump -i Bond1 -nnevv "(vlan 1264) and (udp port 6081)" is correct. But I also want to filter the inner packet for geneve, for example, ovs-tcpdump -i Bond1 -nnevv "(vlan 1264) and geneve and icmp to filter the inner icmp packet for geneve.

infrastation commented 8 months ago

Thank you for confirming. One more question: if you run the following, do you see the packet?

ovs-tcpdump -i Bond1 -w udp6081.pcap -c 1 "(vlan 1264) and (udp port 6081)"
tcpdump -nnevv -r udp6081.pcap "(vlan 1264) and geneve"
gujun4990 commented 8 months ago

ovs-tcpdump -i Bond1 -w udp6081.pcap -c 1 "(vlan 1264) and (udp port 6081)"

I can get the packet using the above command.

[root@node-2 ~]# ovs-tcpdump -i Bond1 -w udp6081.pcap -c 1 "(vlan 1264) and (udp port 6081)"
dropped privs to tcpdump
tcpdump: listening on miBond1, link-type EN10MB (Ethernet), capture size 262144 bytes
1 packet captured
16 packets received by filter
0 packets dropped by kernel
[root@node-2 ~]# vi udp6081.pcap 
[root@node-2 ~]# 
[root@node-2 ~]# tcpdump -nnevv -r udp6081.pcap "(vlan 1264) and geneve"
reading from file udp6081.pcap, link-type EN10MB (Ethernet)
dropped privs to tcpdump
18:37:46.141812 ba:b3:1d:11:c8:43 > 16:c5:84:65:a4:41, ethertype 802.1Q (0x8100), length 120: vlan 1264, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 22245, offset 0, flags [DF], proto UDP (17), length 102)
    46.168.20.5.30581 > 46.168.20.3.6081: [udp sum ok] Geneve, Flags [none], vni 0x0, proto TEB (0x6558)
    2e:ad:5e:3a:d0:20 > 00:23:20:00:00:01, ethertype IPv4 (0x0800), length 66: (tos 0xc0, ttl 255, id 0, offset 0, flags [none], proto UDP (17), length 52)
    169.254.1.1.49262 > 169.254.1.0.3784: [no cksum] BFDv1, length: 24
    Control, State Up, Flags: [none], Diagnostic: No Diagnostic (0x00)
    Detection Timer Multiplier: 3 (300 ms Detection time), BFD Length: 24
    My Discriminator: 0x8e8820d8, Your Discriminator: 0xd70dd484
      Desired min Tx Interval:     100 ms
      Required min Rx Interval:   1000 ms
      Required min Echo Interval:    0 ms
infrastation commented 8 months ago

Thank you. The problem reproduces during the live capture only, that's why offline GENEVE tests do not fail. This also involves Linux VLANs, but matching a UDP port after the VLAN header works correctly, so the root cause looks related to the combination of VLAN and GENEVE. Can you retry the last test using libcpap 1.10.4 and the master branch?

gujun4990 commented 8 months ago

I compile a tcpdump command using the master branch for libpcap and tcpdump. The same problem as before.

root@work:tcpdump# ./tcpdump --version
tcpdump version 5.0.0-PRE-GIT
libpcap version 1.11.0-PRE-GIT (with TPACKET_V3)
root@work:tcpdump# 
root@work:tcpdump# ./tcpdump -i ens3 -nnevv  "(vlan 1264) and geneve"
Warning: Kernel filter failed: Invalid argument
tcpdump: listening on ens3, link-type EN10MB (Ethernet), snapshot length 262144 bytes

^C
0 packets captured
9 packets received by filter
0 packets dropped by kernel
guyharris commented 8 months ago

There are, I suspect, two problems here.

Warning: Kernel filter failed: Invalid argument

That's the first one - the compiler is producing code that the kernel rejects, returning an EINVAL error.

The second problem may be that libpcap then just uses that filter in userland, but that won't work, because, thanks to Linux's filtering code receiving packets with VLAN tags removed and put into metadata, different filtering code needs to be used in the kernel than in userland. The resulting filter probably won't work in userland unless it's done before libpcap inserts the VLAN tag back into the packet data, with the userland (classic) BPF interpreter supporting the special Linux BPF loads that fetch metadata.

guyharris commented 8 months ago

Yeah, the code generator is broken for this case.

gujun4990 commented 8 months ago

Thank you for your analysis, whether there are some workarounds to avoid the issue?

guyharris commented 8 months ago

If there's a way to disable Linux's "pull the VLAN tag out" stuff, that might work, but I'm not sure there's a way to do that.

This is the result of the BPF compiler code code to handle VLANs in the Linux live capture case using a mechanism that's also used by the BPF compiler code to handle Geneve, and their uses of that mechanism step on top of each other. This will take some detangling....

gujun4990 commented 8 months ago

Thanks, Currently we may only parse geneve messages layer by layer without using the geneve keyword.

guyharris commented 8 months ago

A long time ago, somebody (Dave Mills?) came up with the term "Christmas tree packet":

https://en.wikipedia.org/wiki/Christmas_tree_packet

referring "a packet with every single option set for whatever protocol is in use."

I'd like to see somebody construct a "skyscraper packet" or a "Jenga packet":

https://en.wikipedia.org/wiki/Jenga

with every possible type of tunneling/encapsulation in it - VLAN, MPLS, Geneva, VXLAN, GRE, etc., etc., etc..

infrastation commented 8 months ago

Even if a Jenga packet always uses every type of header exactly once, quite a few protocols can encapsulate each other many different ways around. Then, given a set of allowed protocols and a matrix of how they may combine, calculating the number of different possible valid Jenga packet varieties would be a matter of a practicable combinatoric exercise. If this number does not yet have a name, let's call it a Harris number. The Harris number for various parts of the Internet, or for the same parts on different years, would be different. However measured, most of the time the value is a non-decreasing function, and each time it increases it almost certainly attracts comments from network protocol analyser developers.

On the subject matter, if the same problem stands for vlan and pppoes or vlan and mpls, it may be a good idea to untangle those cases in the same go. If it does not, maybe the solution could be copied from there.