iovisor / bcc

BCC - Tools for BPF-based Linux IO analysis, networking, monitoring, and more
Apache License 2.0
20.36k stars 3.86k forks source link

Miscalulation of offset with consecutive bitfields #4890

Closed shunghsiyu closed 8 months ago

shunghsiyu commented 8 months ago

With latest bcc v0.29.1 and LLVM 17.0.6. bcc will produce a BPF program with incorrect offset for struct my_struct.ptr in the following (incomplete) snipped:

struct my_struct
{
    long: 64;
    long: 64;
    long: 64;
    void* ptr;
};

int kprobe__netif_rx(struct pt_regs *ctx)
{
    struct event_t event = {};
    void* p1 = (void*)ctx->di;
    void* p2 = NULL;

    event.offset1 = ((long)&((struct my_struct*)p1)->ptr) - (long)p1;
    event.offset2 = ((long)&((struct my_struct*)p2)->ptr) - (long)p2;
    events.perf_submit(ctx, &event, sizeof(event));
    return 0;
}

where the produced BPF program is

Disassembly of function kprobe__netif_rx
; { // Line  34
   0:   b7 02 00 00 18 00 00 00 r2 = 24
; event.offset2 = (long)&((struct my_struct*)p2)->ptr - (long)p2; // Line  41
   1:   7b 2a f8 ff 00 00 00 00 *(u64 *)(r10 - 8) = r2
   2:   b7 02 00 00 20 00 00 00 r2 = 32
; event.offset1 = (long)&((struct my_struct*)p1)->ptr - (long)p1; // Line  40
   3:   7b 2a f0 ff 00 00 00 00 *(u64 *)(r10 - 16) = r2
; bpf_perf_event_output(ctx, (void *)bpf_pseudo_fd(1, -1), CUR_CPU_IDENTIFIER, &event, sizeof(event)); // Line  42
   ...

In the above we can see offset1 == 32 while offset2 == 24 despite both are calculating the offset of ptr.

Bisect landed me on iovisor/bcc@f35dae07 (merged in iovisor/bcc#2863) which was meant to address iovisor/bcc#2827 by introducing i128 type to the BPF layout spec (supported since LLVM 11); but I don't know what's the proper fix here.

To my best knowledge this has to do with the consecutive bitfields merging together and become a singe i192 type, which will use the alignment of the largest defined type (i64:64/8 bytes before and i128:128/16 bytes after iovisor/bcc@f35dae07).

%struct.my_struct = type { i192, ptr }

Note: LLVM release 18.1.0-rc1 has llvm/llvm-project@a21abc782a8e1cb718a10c471a3b634f3102fc1c that made the issue go away for me. However, I cannot judge whether it is the proper fix in this case.

(CC @yonghong-song)

shunghsiyu commented 8 months ago

the full reproducer can be found here, and below is the LLVM IR of the program for completeness. As far as I can tell the IR looks correct.

define dso_local i32 @kprobe__netif_rx(ptr noundef %0) local_unnamed_addr #0 section ".bpf.fn.kprobe__netif_rx" !dbg !67 {
  %2 = alloca %struct.event_t, align 8, !DIAssignID !104
  %3 = getelementptr inbounds %struct.pt_regs, ptr %0, i64 0, i32 14, !dbg !108
  %4 = load i64, ptr %3, align 8, !dbg !108, !tbaa !109
  %5 = inttoptr i64 %4 to ptr, !dbg !114
  %6 = getelementptr inbounds %struct.my_struct, ptr %5, i64 0, i32 1, !dbg !115
  %7 = ptrtoint ptr %6 to i64, !dbg !116
  %8 = sub nsw i64 %7, %4, !dbg !117
  store i64 %8, ptr %2, align 8, !dbg !118, !tbaa !119, !DIAssignID !121
  %9 = getelementptr inbounds %struct.event_t, ptr %2, i64 0, i32 1, !dbg !122
  store i64 24, ptr %9, align 8, !dbg !123, !tbaa !124, !DIAssignID !125
  ...
}
yonghong-song commented 8 months ago

@shunghsiyu thanks for reporting. This is not a compiler bug but rather it is an unfortunate situation with the interaction between x86 IR and BPF backend. Note that currently, currently, the original bpf program is compiled first with x86 arch. The reason is we need x86 headers in compiling bpf programs. Once IR is generated, the x86 IR will feed into bpf backend. In certain situations, this may cause the problem. The following is a simple example to demonstrate:

$ cat t.c
struct my_struct {
        long: 64;
        long: 64;
        long: 64;
        void *ptr;
};

struct event_t {
        long offset0;
        long offset1;
        long offset2;
};
void bar(void *);

int foo(void *p1) {
        struct event_t event = {};
        void *p2 = 0;

        event.offset1 = ((long)&((struct my_struct*)p1)->ptr) - (long)p1;
        event.offset2 = ((long)&((struct my_struct*)p2)->ptr) - (long)p2;
        bar(&event);
        return 0;
}

The compilation steps:

  clang -O2 -S -emit-llvm t.c -Xclang -disable-llvm-passes -o t.ll
  llc -march=bpf -O2 t.ll

With llvm17, the llvm with x86 target has %struct.my_struct = type { i192, ptr }. With llvm18, we have %struct.my_struct = type { [24 x i8], ptr }.

Since BPF does not have i192, it rounds up to alignment 16 and this caused a problem.

The fix is rather simple, do not use bitfield and proper alignments will be the same for both x86 and bpf.

shunghsiyu commented 8 months ago

Thanks for the explanation. So if I understand correctly the reason that offset1 and offset2 differs is because they came from different sources?

offset2 == 24 came from LLVM x86 IR directly, because the (long)&((struct my_struct *)0)->ptr can be easily inferred to be offset calculation by the x86 IR and transformed into a constant. The using the x86 alignment is used here and thus i192 is aligned to 8.

On the other hand offset1 == 32 came from LLVM BPF backend, because x86 IR cannot infer that ((long)&((struct my_struct*)p1)->ptr) - (long)p1 is offset calculation (because -disable-llvm-passes maybe?), therefore compiles it literally into a sequence of instructions involving sub. The BPF backend was then able to transform such sequence into a constant, but there it uses BPF alignment where i192 is aligned to 16 (since iovisor/bcc@f35dae07).

Is the above correct?

yonghong-song commented 8 months ago

Your interpretation largely correct. I didn't dig out in llvm why for i192 type BPF backend put an alignment of 16 though. This probably related to how to handle i<> -> alignment in the arch-specific string.

shunghsiyu commented 8 months ago

I didn't dig out in llvm why for i192 type BPF backend put an alignment of 16 though

I think I read on the LLVM's Phabricator archive that "the alignment of the largest defined type (i128:128) will be chosen as the alignment" for types that are not explicitly defined in data layout.

shunghsiyu commented 8 months ago

I'm still trying to figure out from the original bug report why consecutive bitfields are used in the first place, and whether it is strictly necessary.

Let's close this issue for now, and I'll reopen if needed. Thanks!

shunghsiyu commented 6 months ago

I'm still trying to figure out from the original bug report why consecutive bitfields are used in the first place, and whether it is strictly necessary.

The consecutive bitfields came from vmlinux.h generated by bpftool, which had more long :64 padding than usual because the original BTF has been processed by gen min_core_btf first.

From the existence of libbpf-tools I would guess that bcc is not meant to be used with vmlinux.h, rather it should stick to the kernel header that are produced as part of the kernel build process.

But so far I haven't found reference that confirm or deny the above guess. @yonghong-song do yon know if this is the case? It'd be nice to get an explicit statement on this subject. Thanks!

yonghong-song commented 6 months ago

Yes, bcc won't use vmlinux.h and libbpf-tools is using vmlinux.h.

shunghsiyu commented 5 months ago

FWIW we end up working around this by passing the -ffine-grained-bitfield-accesses flag to Clang