p4lang / p4c

P4_16 reference compiler
https://p4.org/
Apache License 2.0
666 stars 441 forks source link

PSA backend, xdp helper program is reaplied in case of normal path to kernel #4126

Open roody225 opened 1 year ago

roody225 commented 1 year ago

In case of TC based design the xdp-ingress helper program that replaces original ethertype of a packet with 0x0800 is reaplied before sending to the network stack. It results in malformed packets (with wrong ethertype) received on network interface.

To reproduce the bug set up vms using attached Vagrantfile and p4 program in the test.zip archive

$ vagrant up

log in to host1

$ vagrant ssh host1

ping another host to trigger arp requests

host1$ ping 192.168.200.12

in another terminal log in to router

$ vagrant ssh router

tcpdump the interface corresponding to host1 link

router$ sudo tcpdump -exxXXi eth1

malformed arp request can be seen on the dump

15:25:50.641293 08:00:27:44:5c:76 (oui Unknown) > Broadcast, ethertype IPv4 (0x0800), length 60: IP0 (invalid)
    0x0000:  ffff ffff ffff 0800 2744 5c76 0800 0001  ........'D\v....
    0x0010:  0800 0604 0001 0800 2744 5c76 c0a8 640b  ........'D\v..d.
    0x0020:  0000 0000 0000 c0a8 6401 0000 0000 0000  ........d.......
    0x0030:  0000 0000 0000 0000 0000 0000            ............

for me a workaround is to check the meta field and skip the substitution of ethertype. Check the patch:

diff --git a/backends/ebpf/psa/xdpHelpProgram.h b/backends/ebpf/psa/xdpHelpProgram.h
index e55a4b8dd..55e49317b 100644
--- a/backends/ebpf/psa/xdpHelpProgram.h
+++ b/backends/ebpf/psa/xdpHelpProgram.h
@@ -45,6 +45,8 @@ class XDPHelpProgram : public EBPFProgram {
         "    if ((void *)((struct ethhdr *) eth + 1) > data_end) {\n"
         "        return XDP_ABORTED;\n"
         "    }\n"
+        "    if (meta->pkt_ether_type != 0)"
+        "        return XDP_PASS;"
         "    meta->pkt_ether_type = eth->h_proto;\n"
         "    eth->h_proto = bpf_htons(0x0800);\n"
         "\n"

However I don't know if it is relevant solution and if the problem is related to my setup rather than the compiler bug. Have anyone checked this behavior in some other environment? test.zip

fruffy commented 1 year ago

Which binary is used here? https://github.com/p4lang/p4c/tree/main/backends/ebpf/psa?

@osinstom @tatry

tatry commented 1 year ago

I wonder why XDP layer is re-applied to the packet? There should be no way to send back packet to XDP from TC directly...

This might not be helpful but some time ago I tried to port PTF tests from Ubuntu 20.04 to 22.04. One of the problems that I couldn't resolve (due to lack of time) was test with path to kernel. It failed on Ubuntu 22.04 even using the same kernel version as for Ubuntu 20.04 where it is passing.

@roody225 Could you setup your environment using Ubuntu 20.04 instead of 22.04 for NIKSS switch just for verify behavior? (At least I think it is Ubuntu 22.04 based on vagrant file)

osinstom commented 1 year ago

If I understand your setup and use case properly, it's expected behavior.

In case of TC-based design and non-IPv4 packets, the XDP helper program will always replace original EtherType. Since tcpdump is applied before the eBPF TC hook, you will always see malformed packets in tcpdump. However, the eBPF program attached to TC will fix the EtherType field.

Also, in case of sending packets to the Linux kernel (normal_path_to_kernel), the packet will go through the additional round of recirculation, so the ARP packet in your case will traverse the following path:

XDP helper program -> ingress TC (skb->protocol=0x0806, EtherType is replaced here) -> egress TC (skb->protocol=0x0806) -> ingress TC (skb->protocol=0x0800) -> Linux network stack (TC_ACT_OK)

Just to confirm: 1) can you share your full PCAP dump? 2) Can you compile the P4 program with --trace flag and paste logs (bpftool prog tracelog) here?

roody225 commented 1 year ago

@fruffy I use the binary installed from apt repository check install script

. /etc/os-release
echo "deb https://download.opensuse.org/repositories/home:/p4lang/xUbuntu_${VERSION_ID}/ /" | sudo tee /etc/apt/sources.list.d/home:p4lang.list
curl -L "https://download.opensuse.org/repositories/home:/p4lang/xUbuntu_${VERSION_ID}/Release.key" | sudo apt-key add -
sudo apt-get -y update
sudo apt-get -y install p4lang-p4c

to be honest the patch that I posted is not used by me, I just compile p4 to C code, then change the xdp-helper and then compile the C code to ebpf

@tatry Yes I am using Ubuntu 22.04, I've checked the behavior is the same on Ubuntu 20.04

@osinstom each packet that is sent to the linux kernel (normal_path_to_kernel) can be seen twice on the dump. With my fix the first occurence is malformed and the second one has correct eth_type. Without my fix both of them are malformed. I experimented a bit and I am sure that the xdp-helper program replaces eth_type and saves the original one in the meta field, then tc-ingress hook retrieves the original eth_type and my p4 program (the more complex one that I did not post here) works well using correct eth_type. However when the normal_path_to_kernel is used the xdp-helper program is reaplied before sending the packet to the network stack. What is strange for me the packet does not reenter the tc-ingress, so there must be some mechanism that recoginzes it. Additionally, as You can see my workaround uses the meta field, and the original eth_type is still there during the second xdp-helper call, that's why it works, so the field lives between the two calls of the xdp-helper

sender tcpdump output:

vagrant@host1:~$ sudo tcpdump -exxXXi enp0s8
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on enp0s8, link-type EN10MB (Ethernet), capture size 262144 bytes
14:31:12.085268 08:00:27:c0:c4:72 (oui Unknown) > Broadcast, ethertype ARP (0x0806), length 42: Request who-has 192.168.100.1 tell 192.168.100.11, length 28
    0x0000:  ffff ffff ffff 0800 27c0 c472 0806 0001  ........'..r....
    0x0010:  0800 0604 0001 0800 27c0 c472 c0a8 640b  ........'..r..d.
    0x0020:  0000 0000 0000 c0a8 6401                 ........d.
14:31:13.085178 08:00:27:c0:c4:72 (oui Unknown) > Broadcast, ethertype ARP (0x0806), length 42: Request who-has 192.168.100.1 tell 192.168.100.11, length 28
    0x0000:  ffff ffff ffff 0800 27c0 c472 0806 0001  ........'..r....
    0x0010:  0800 0604 0001 0800 27c0 c472 c0a8 640b  ........'..r..d.
    0x0020:  0000 0000 0000 c0a8 6401                 ........d.
14:31:14.084732 08:00:27:c0:c4:72 (oui Unknown) > Broadcast, ethertype ARP (0x0806), length 42: Request who-has 192.168.100.1 tell 192.168.100.11, length 28
    0x0000:  ffff ffff ffff 0800 27c0 c472 0806 0001  ........'..r....
    0x0010:  0800 0604 0001 0800 27c0 c472 c0a8 640b  ........'..r..d.
    0x0020:  0000 0000 0000 c0a8 6401                 ........d.

receiver tcpdump output:

vagrant@router:~$ sudo tcpdump -exxXXi eth1
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 262144 bytes
14:31:11.535441 08:00:27:c0:c4:72 (oui Unknown) > Broadcast, ethertype IPv4 (0x0800), length 60: IP0 
    0x0000:  ffff ffff ffff 0800 27c0 c472 0800 0001  ........'..r....
    0x0010:  0800 0604 0001 0800 27c0 c472 c0a8 640b  ........'..r..d.
    0x0020:  0000 0000 0000 c0a8 6401 0000 0000 0000  ........d.......
    0x0030:  0000 0000 0000 0000 0000 0000            ............
14:31:11.535441 08:00:27:c0:c4:72 (oui Unknown) > Broadcast, ethertype IPv4 (0x0800), length 60: IP0 
    0x0000:  ffff ffff ffff 0800 27c0 c472 0800 0001  ........'..r....
    0x0010:  0800 0604 0001 0800 27c0 c472 c0a8 640b  ........'..r..d.
    0x0020:  0000 0000 0000 c0a8 6401 0000 0000 0000  ........d.......
    0x0030:  0000 0000 0000 0000 0000 0000            ............
14:31:12.535801 08:00:27:c0:c4:72 (oui Unknown) > Broadcast, ethertype IPv4 (0x0800), length 60: IP0 
    0x0000:  ffff ffff ffff 0800 27c0 c472 0800 0001  ........'..r....
    0x0010:  0800 0604 0001 0800 27c0 c472 c0a8 640b  ........'..r..d.
    0x0020:  0000 0000 0000 c0a8 6401 0000 0000 0000  ........d.......
    0x0030:  0000 0000 0000 0000 0000 0000            ............
14:31:12.535801 08:00:27:c0:c4:72 (oui Unknown) > Broadcast, ethertype IPv4 (0x0800), length 60: IP0 
    0x0000:  ffff ffff ffff 0800 27c0 c472 0800 0001  ........'..r....
    0x0010:  0800 0604 0001 0800 27c0 c472 c0a8 640b  ........'..r..d.
    0x0020:  0000 0000 0000 c0a8 6401 0000 0000 0000  ........d.......
    0x0030:  0000 0000 0000 0000 0000 0000            ............
14:31:13.535628 08:00:27:c0:c4:72 (oui Unknown) > Broadcast, ethertype IPv4 (0x0800), length 60: IP0 
    0x0000:  ffff ffff ffff 0800 27c0 c472 0800 0001  ........'..r....
    0x0010:  0800 0604 0001 0800 27c0 c472 c0a8 640b  ........'..r..d.
    0x0020:  0000 0000 0000 c0a8 6401 0000 0000 0000  ........d.......
    0x0030:  0000 0000 0000 0000 0000 0000            ............
14:31:13.535628 08:00:27:c0:c4:72 (oui Unknown) > Broadcast, ethertype IPv4 (0x0800), length 60: IP0 
    0x0000:  ffff ffff ffff 0800 27c0 c472 0800 0001  ........'..r....
    0x0010:  0800 0604 0001 0800 27c0 c472 c0a8 640b  ........'..r..d.
    0x0020:  0000 0000 0000 c0a8 6401 0000 0000 0000  ........d.......
    0x0030:  0000 0000 0000 0000 0000 0000            ............

bpftool logs:

vagrant@router:~$ sudo !!
sudo bpftool prog tracelog
          <idle>-0       [000] ..s.  1469.206574: 0: classifier/tc-ingress parser: parsing new packet, input_port=3, path=0, pkt_len=60
          <idle>-0       [000] .Ns.  1469.206602: 0: Parser: state start (curr_offset=0)
          <idle>-0       [000] .Ns.  1469.206604: 0: Parser: check pkt_len=60 >= last_read_byte=14
          <idle>-0       [000] .Ns.  1469.206604: 0: Parser: extracting header parsed_hdr.ethernet
          <idle>-0       [000] .Ns.  1469.206605: 0: Parser: extracting field dst
          <idle>-0       [000] .Ns.  1469.206606: 0: Parser: extracted dst=0xffffffffffff (48 bits)
          <idle>-0       [000] .Ns.  1469.206606: 0: Parser: extracting field src
          <idle>-0       [000] .Ns.  1469.206607: 0: Parser: extracted src=0x80027c0c472 (48 bits)
          <idle>-0       [000] .Ns.  1469.206608: 0: Parser: extracting field etherType
          <idle>-0       [000] .Ns.  1469.206609: 0: Parser: extracted etherType=0x806 (16 bits)
          <idle>-0       [000] .Ns.  1469.206609: 0: Parser: extracted parsed_hdr.ethernet
          <idle>-0       [000] .Ns.  1469.206610: 0: classifier/tc-ingress control: packet processing started
          <idle>-0       [000] .Ns.  1469.206610: 0: Control: explicit calling action normal_path_to_kernel()
          <idle>-0       [000] .Ns.  1469.206611: 0: classifier/tc-ingress control: packet processing finished
          <idle>-0       [000] .Ns.  1469.206612: 0: classifier/tc-ingress deparser: packet deparsing started
          <idle>-0       [000] .Ns.  1469.206612: 0: Deparser: emitting header hdr.ethernet
          <idle>-0       [000] .Ns.  1469.206613: 0: Deparser: emitting field dst=0xffffffffffff (48 bits)
          <idle>-0       [000] .Ns.  1469.206614: 0: Deparser: emitting field src=0x80027c0c472 (48 bits)
          <idle>-0       [000] .Ns.  1469.206614: 0: Deparser: emitting field etherType=0x806 (16 bits)
          <idle>-0       [000] .Ns.  1469.206615: 0: classifier/tc-ingress deparser: packet deparsing finished
          <idle>-0       [000] .Ns.  1469.206616: 0: IngressTM: Sending packet up to the kernel stack
          <idle>-0       [000] ..s.  1470.206421: 0: classifier/tc-ingress parser: parsing new packet, input_port=3, path=0, pkt_len=60
          <idle>-0       [000] .Ns.  1470.206440: 0: Parser: state start (curr_offset=0)
          <idle>-0       [000] .Ns.  1470.206442: 0: Parser: check pkt_len=60 >= last_read_byte=14
          <idle>-0       [000] .Ns.  1470.206442: 0: Parser: extracting header parsed_hdr.ethernet
          <idle>-0       [000] .Ns.  1470.206443: 0: Parser: extracting field dst
          <idle>-0       [000] .Ns.  1470.206444: 0: Parser: extracted dst=0xffffffffffff (48 bits)
          <idle>-0       [000] .Ns.  1470.206445: 0: Parser: extracting field src
          <idle>-0       [000] .Ns.  1470.206446: 0: Parser: extracted src=0x80027c0c472 (48 bits)
          <idle>-0       [000] .Ns.  1470.206447: 0: Parser: extracting field etherType
          <idle>-0       [000] .Ns.  1470.206448: 0: Parser: extracted etherType=0x806 (16 bits)
          <idle>-0       [000] .Ns.  1470.206448: 0: Parser: extracted parsed_hdr.ethernet
          <idle>-0       [000] .Ns.  1470.206449: 0: classifier/tc-ingress control: packet processing started
          <idle>-0       [000] .Ns.  1470.206450: 0: Control: explicit calling action normal_path_to_kernel()
          <idle>-0       [000] .Ns.  1470.206450: 0: classifier/tc-ingress control: packet processing finished
          <idle>-0       [000] .Ns.  1470.206451: 0: classifier/tc-ingress deparser: packet deparsing started
          <idle>-0       [000] .Ns.  1470.206452: 0: Deparser: emitting header hdr.ethernet
          <idle>-0       [000] .Ns.  1470.206453: 0: Deparser: emitting field dst=0xffffffffffff (48 bits)
          <idle>-0       [000] .Ns.  1470.206454: 0: Deparser: emitting field src=0x80027c0c472 (48 bits)
          <idle>-0       [000] .Ns.  1470.206455: 0: Deparser: emitting field etherType=0x806 (16 bits)
          <idle>-0       [000] .Ns.  1470.206455: 0: classifier/tc-ingress deparser: packet deparsing finished
          <idle>-0       [000] .Ns.  1470.206456: 0: IngressTM: Sending packet up to the kernel stack
          <idle>-0       [000] ..s.  1471.205751: 0: classifier/tc-ingress parser: parsing new packet, input_port=3, path=0, pkt_len=60
          <idle>-0       [000] .Ns.  1471.205769: 0: Parser: state start (curr_offset=0)
          <idle>-0       [000] .Ns.  1471.205770: 0: Parser: check pkt_len=60 >= last_read_byte=14
          <idle>-0       [000] .Ns.  1471.205771: 0: Parser: extracting header parsed_hdr.ethernet
          <idle>-0       [000] .Ns.  1471.205771: 0: Parser: extracting field dst
          <idle>-0       [000] .Ns.  1471.205772: 0: Parser: extracted dst=0xffffffffffff (48 bits)
          <idle>-0       [000] .Ns.  1471.205773: 0: Parser: extracting field src
          <idle>-0       [000] .Ns.  1471.205774: 0: Parser: extracted src=0x80027c0c472 (48 bits)
          <idle>-0       [000] .Ns.  1471.205774: 0: Parser: extracting field etherType
          <idle>-0       [000] .Ns.  1471.205775: 0: Parser: extracted etherType=0x806 (16 bits)
          <idle>-0       [000] .Ns.  1471.205776: 0: Parser: extracted parsed_hdr.ethernet
          <idle>-0       [000] .Ns.  1471.205776: 0: classifier/tc-ingress control: packet processing started
          <idle>-0       [000] .Ns.  1471.205777: 0: Control: explicit calling action normal_path_to_kernel()
          <idle>-0       [000] .Ns.  1471.205777: 0: classifier/tc-ingress control: packet processing finished
          <idle>-0       [000] .Ns.  1471.205778: 0: classifier/tc-ingress deparser: packet deparsing started
          <idle>-0       [000] .Ns.  1471.205779: 0: Deparser: emitting header hdr.ethernet
          <idle>-0       [000] .Ns.  1471.205780: 0: Deparser: emitting field dst=0xffffffffffff (48 bits)
          <idle>-0       [000] .Ns.  1471.205781: 0: Deparser: emitting field src=0x80027c0c472 (48 bits)
          <idle>-0       [000] .Ns.  1471.205781: 0: Deparser: emitting field etherType=0x806 (16 bits)
          <idle>-0       [000] .Ns.  1471.205782: 0: classifier/tc-ingress deparser: packet deparsing finished
          <idle>-0       [000] .Ns.  1471.205783: 0: IngressTM: Sending packet up to the kernel stack