loxilb-io / loxilb

eBPF based cloud-native load-balancer for Kubernetes|Edge|Telco|IoT|XaaS.
https://www.loxilb.io
Apache License 2.0
1.45k stars 123 forks source link

BPFireOS: Prog section 'tc_packet_hook0' rejected: Permission denied (13)! R1 type=scalar expected=map_ptr #666

Closed vincentmli closed 6 months ago

vincentmli commented 6 months ago

Describe the bug

this is follow up issue of https://github.com/loxilb-io/loxilb/issues/661 where we addressed the cpu_map creation issue of Argument list too long.

so followed the suggested steps below:

loxilb-ebpf/utils/mkllb_bpffs.sh
sudo ip link add tap0 type veth peer name tap1
sudo ifconfig tap0 up
sudo ifconfig tap1 up
sudo mkdir -p /opt/fs/bpf
sudo bpftool -d prog  load  /opt/loxilb/llb_xdp_main.o /opt/fs/bpf/xdp_packet_hook type xdp pinmaps /opt/loxilb/dp/bpf
sudo ntc filter add dev tap0 egress bpf da obj /opt/loxilb/llb_ebpf_main.o sec tc_packet_hook0

the last ntc command generated following

[root@bpfire-6 ~]# ntc filter add dev tap0 egress bpf da obj /opt/loxilb/llb_ebpf_main.o sec tc_packet_hook0 verb

BTF debug data section '.BTF' rejected: Invalid argument (22)!
 - Length:       73740
Verifier analysis:

magic: 0xeb9f
version: 1
flags: 0x0
hdr_len: 24
type_off: 0
type_len: 12812
str_off: 12812
str_len: 60904
btf_total_size: 73740
[1] PTR (anon) type_id=3
[2] INT int size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
[3] ARRAY (anon) type_id=2 index_type_id=4 nr_elems=6
[4] INT __ARRAY_SIZE_TYPE__ size=4 bits_offset=0 nr_bits=32 encoding=(none)
[5] PTR (anon) type_id=2
[6] PTR (anon) type_id=7
[7] STRUCT xfi size=360 vlen=10
    fm type_id=8 bits_offset=0
    l2m type_id=13 bits_offset=128
    l34m type_id=20 bits_offset=320
    il2m type_id=13 bits_offset=704
    il34m type_id=20 bits_offset=896
    tm type_id=22 bits_offset=1280
    nm type_id=24 bits_offset=1472
    km type_id=25 bits_offset=2048
    qm type_id=27 bits_offset=2176
    pm type_id=28 bits_offset=2240
[8] STRUCT dp_fr_mdi size=16 vlen=3
    dat type_id=9 bits_offset=0
    dat_end type_id=9 bits_offset=32
    tstamp type_id=11 bits_offset=64
...cut...

Prog section 'tc_packet_hook0' rejected: Permission denied (13)!
 - Type:         3
 - Instructions: 3399 (0 over limit)
 - License:      Dual BSD/GPL

Verifier analysis:

0: R1=ctx(off=0,imm=0) R10=fp0
0: (bf) r6 = r1                       ; R1=ctx(off=0,imm=0) R6_w=ctx(off=0,imm=0)
1: (b7) r0 = 0                        ; R0_w=0
2: (61) r1 = *(u32 *)(r6 +60)         ; R1_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R6_w=ctx(off=0,imm=0)
3: (18) r2 = 0xab1ef01d               ; R2_w=2870931485
...CUT...
212: (b7) r1 = 0                      ; R1_w=0
213: (6b) *(u16 *)(r10 -50) = r1      ; R1_w=0 R10=fp0 fp-56=00?00000
214: (73) *(u8 *)(r10 -51) = r1       ; R1_w=0 R10=fp0 fp-56=00000000
215: (63) *(u32 *)(r10 -68) = r1      ; R1_w=0 R10=fp0 fp-72=00000000
216: (bf) r2 = r10                    ; R2_w=fp0 R10=fp0
217: (07) r2 += -64                   ; R2_w=fp-64
218: (18) r1 = 0x0                    ; R1_w=0
220: (85) call bpf_map_lookup_elem#1
R1 type=scalar expected=map_ptr
processed 181 insns (limit 1000000) max_states_per_insn 0 total_states 4 peak_states 4 mark_read 4

Error fetching program/map!
Unable to load program

I used llvm-objdump to get more information about the 220: (85) call bpf_map_lookup_elem#1 to find out which map is being looked up

llvm-objdump -S /opt/loxilb/llb_ebpf_main.o --section="tc_packet_hook0"

;   xf->pm.table_id = LL_DP_FCV4_MAP;
     211:       73 1a 87 ff 00 00 00 00 *(u8 *)(r10 - 0x79) = r1
     212:       b7 01 00 00 00 00 00 00 r1 = 0x0
;   key->in_port    = 0;
     213:       6b 1a ce ff 00 00 00 00 *(u16 *)(r10 - 0x32) = r1
;   key->pad        = 0;
     214:       73 1a cd ff 00 00 00 00 *(u8 *)(r10 - 0x33) = r1
;   int z = 0;
     215:       63 1a bc ff 00 00 00 00 *(u32 *)(r10 - 0x44) = r1
     216:       bf a2 00 00 00 00 00 00 r2 = r10
     217:       07 02 00 00 c0 ff ff ff r2 += -0x40
;   acts = bpf_map_lookup_elem(&fc_v4_map, &key);
     218:       18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0x0 ll
     220:       85 00 00 00 01 00 00 00 call 0x1
     221:       bf 07 00 00 00 00 00 00 r7 = r0

the verifier error R1 type=scalar expected=map_ptr seems meaning R1 is expected to be a map pointer address, but R1 is 0x0, right? the fc_v4_map is created but empty from bpftool map dump name fc_v4_map, any clue on fixing this issue?

To Reproduce Steps to reproduce the behavior:

build loxilb on Ubuntu 22.04 and copy loxilb binary and loxilb-ebpf object to BPFire OS, and run following steps

loxilb-ebpf/utils/mkllb_bpffs.sh
sudo ip link add tap0 type veth peer name tap1
sudo ifconfig tap0 up
sudo ifconfig tap1 up
sudo mkdir -p /opt/fs/bpf
sudo bpftool -d prog  load  /opt/loxilb/llb_xdp_main.o /opt/fs/bpf/xdp_packet_hook type xdp pinmaps /opt/loxilb/dp/bpf
sudo ntc filter add dev tap0 egress bpf da obj /opt/loxilb/llb_ebpf_main.o sec tc_packet_hook0

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

Additional context

BPFire OS has libbpf 0.8.3 installed and fixed kernel CONFIG_NR_CPUS to 512 to address https://github.com/loxilb-io/loxilb/issues/661

UltraInstinct14 commented 6 months ago

Hi @vincentmli Is it possible to download the BPFFire OS version you are trying and we can also check it out. We are unable to reproduce this issue in ubuntu or other OS's.

vincentmli commented 6 months ago

@UltraInstinct14 do you prefer ISO image or qcow2? I can build one for you, BPFire is clone of IPFire, so once you have the image, it is easy to setup

vincentmli commented 6 months ago

here is the raw image I used https://drive.google.com/file/d/1516U6nlDsQgukzDEn1u8y7JZOWHmoA66/view?usp=drive_link, this image has console enabled so when you start it up from libvirt/KVM, and virsh console <guest domainname>, you will get console screen to set it up following the menu.

note BPFire/IPFire requires at least two network interfaces, so you can setup two linux bridge interfaces for the guest beforehand

brctl addbr br0
brctl addbr br1
brctl addif br0 <interface>
brctl addif br1 <interface>

then for the guest

xz -d ipfire-2.29-core184-x86_64-with-serial-console.img.xz
qemu-img convert -f raw -O qcow2 ipfire-2.29-core184-x86_64.img ipfire-bpf.qcow2
virsh define <libvirt xml>
virsh start ipfiretest
virsh console ipfiretest

this is the libvirt xml for the BPFire OS I used

<domain type='kvm'>
  <name>ipfiretest</name>
  <memory unit='KiB'>8109568</memory>
  <currentMemory unit='KiB'>8108864</currentMemory>
  <vcpu placement='static'>4</vcpu>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64' machine='pc-i440fx-trusty'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>qemu64</model>
    <feature policy='require' name='x2apic'/>
    <feature policy='require' name='popcnt'/>
    <feature policy='require' name='hypervisor'/>
    <feature policy='require' name='lahf_lm'/>
    <feature policy='disable' name='svm'/>
    <feature policy='disable' name='abm'/>
    <feature policy='disable' name='sse4a'/>
  </cpu>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/bin/kvm-spice</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/tmp/ipfire-bpf.qcow2'/>
      <backingStore/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </disk>
    <controller type='usb' index='0' model='piix3-uhci'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'/>
    <controller type='ide' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <interface type='bridge'>
      <source bridge='br0'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <interface type='bridge'>
      <source bridge='br3'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <graphics type='vnc' port='-1' autoport='yes' listen='127.0.0.1'>
      <listen type='address' address='127.0.0.1'/>
    </graphics>
    <audio id='1' type='none'/>
    <video>
      <model type='cirrus' vram='16384' heads='1' primary='yes'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </memballoon>
  </devices>
  <seclabel type='dynamic' model='apparmor' relabel='yes'/>
  <seclabel type='dynamic' model='dac' relabel='yes'/>
</domain>
UltraInstinct14 commented 6 months ago

Thanks will check and update.

vincentmli commented 6 months ago

by the way, I assume you used the default clang version from Ubuntu 22.04 to compile the loxilb ebpf object, right? I compiled and installed clang 18.0.0 version on Ubuntu 22.04, sometime clang versions could also make difference.

UltraInstinct14 commented 6 months ago

Yes using default clang versions - clang10 for Ubuntu20, clang-14 for Ubuntu 22. Not sure of exact version but the default for Ubuntu24 as well.

vincentmli commented 6 months ago

it appears different kernel version could have different verifier behavior on the ebpf program, I could try BPFire with latest 6.8 kernel and see if verifier changes on the ebpf program.

TrekkieCoder commented 6 months ago

Hi @vincentmli I was able to verify loxilb working in ipfire image. Please find the logs loxilb-ipfire-working.log The image used was as per what you shared before.

Only change required were to comment any bpf_printk in loxilb. I am not sure but it seems to be commented out in IPfire kernel. I tested by doing the following -

  1. Install golang in ipfire
  2. Copied loxilb ebpf/xdp binaries into /opt/loxilb from loxilb build system to ipfire
  3. Copied loxilb binary and helper scripts (mkllb_bpffs.sh) to ipfire
  4. Copied ntc and libbsd from loxilb build system to ipfire

loxilb build system used had Ubuntu-20.04 OS installed which was also used to run ipfire VM image. For bpf_printk issue, I will raise a pull-request soon and link to this issue.

vincentmli commented 6 months ago

@UltraInstinct14 @TrekkieCoder ah, thanks for the finding, now I remember I disabled BPFire kernel tracing and only keep XDP/TC capability, I think bpf_printk requires kernel bpf tracing https://github.com/ipfire/ipfire-2.x/commit/d7544e619290e0a37b5806689f33838a3d04da6c

vincentmli commented 6 months ago

@TrekkieCoder could you upload your build binary in https://github.com/loxilb-io/loxilb/issues/666#issuecomment-2097850413 so I can try in a mini PC running BPFire? for some reason, I still have the same issue after cloning and building your newest merge after removing bpf_printk on Ubuntu 22.04 build machine, I could not find 20.04 build system. also maybe you should change your commit message from ipfireos to bpfireos so it is not misleading, ipfire maintainers has no interest to run ebpf enabled kernel features at this time, so I forked ipfire and named the fork to bpfire :)

vincentmli commented 6 months ago

I think I found my problem, I compiled ntc without libbsd since bpfire does not have libbsd addon, after I compiled ntc with libbsd on Ubuntu 22.04 and copied the libbsd to bpfire, it works

root@u2204-r730:~/loxilb# scp /lib/x86_64-linux-gnu/libbsd* 10.0.0.46:/usr/lib/
root@10.0.0.46's password: 
libbsd.a                                      100%  162KB   9.9MB/s   00:00    
libbsd-ctor.a                                 100% 1276   197.2KB/s   00:00    
libbsd.so                                     100%  170    99.2KB/s   00:00    
libbsd.so.0                                   100%   87KB  30.0MB/s   00:00    
libbsd.so.0.11.5                              100%   87KB  31.5MB/s   00:00    
root@u2204-r730:~/loxilb# ldd /lib/x86_64-linux-gnu/libmd.so.0
    linux-vdso.so.1 (0x00007ffd14353000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x0000797143800000)
    /lib64/ld-linux-x86-64.so.2 (0x0000797143b94000)
root@u2204-r730:~/loxilb# ls -l /lib/x86_64-linux-gnu/libmd.so.0
lrwxrwxrwx 1 root root 14 Mar 24  2022 /lib/x86_64-linux-gnu/libmd.so.0 -> libmd.so.0.0.5
root@u2204-r730:~/loxilb# scp /lib/x86_64-linux-gnu/libmd* 10.0.0.46:/usr/lib/
root@10.0.0.46's password: 
libmd.a                                       100%   62KB   4.6MB/s   00:00    
libmd.so                                      100%   46KB  17.7MB/s   00:00    
libmd.so.0                                    100%   46KB  27.4MB/s   00:00    
libmd.so.0.0.5                                100%   46KB  17.5MB/s   00:00