Function Signature Change Without Obvious Indication

yonghong-song commented 2 months ago

BPF is a linux kernel technology which allows to run some custom codes in kernel. Specially, BPF allows to tracing kernel functions given a function signature. For example in [1], in include/linux/hrtimer.h, we have

  extern void hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
                                     u64 range_ns, const enum hrtimer_mode mode);

The following bpf program can be written to trace kernel function hrtimer_start_range_ns():

  SEC("fentry/hrtimer_start_range_ns")
  int BPF_PROG(handle__fentry, struct hrtimer *timer, ktime_t tim, u64 delta_ns,
               const enum hrtimer_mode mode)
  {
        if (tim == MY_TV_NSEC)
                fentry_called = true;
        return 0;
  }

The function 'hrtimer_start_range_ns' needs to be in kernel /proc/kallsyms so proper kernel address can be found for the to-be-traced function.

  $ cat /proc/kallsyms | grep hrtimer_start_range_ns
  ffffffff8134fd90 T __pfx_hrtimer_start_range_ns
  ffffffff8134fda0 T hrtimer_start_range_ns
  ffffffff84dec2c8 r __ksymtab_hrtimer_start_range_ns

But in clang, it seems possible that function signature might be changed during optimization without changing function names. I tried kernel/bpf/syscall.c with additional option '-mllvm -debug-only=deadargelim' which intends to print out some information for DeadArgumentElimination pass. Eventually, I found three function signature changes:

  DeadArgumentEliminationPass - Removing argument 1 (uattr.coerce0) from map_update_elem
  DeadArgumentEliminationPass - Removing argument 1 (uattr.coerce0) from map_delete_elem
  DeadArgumentEliminationPass - Removing argument 3 (count) from strncpy_from_bpfptr

The following are IR before and after DeadArgumentEliminationPass:

Before:

  define internal fastcc i32 @map_update_elem(ptr noundef %attr, ptr nocapture readnone %uattr.coerce0, i8 %uattr.coerce1)
  define internal fastcc i32 @map_delete_elem(ptr noundef %attr, ptr nocapture readnone %uattr.coerce0, i8 %uattr.coerce1)
  define internal fastcc i64 @strncpy_from_bpfptr(ptr noundef %dst, ptr %src.coerce0, i8 %src.coerce1, i64 noundef %count)

After:

  define internal fastcc i32 @map_update_elem(ptr noundef %attr, i8 %uattr.coerce1)
  define internal fastcc i32 @map_delete_elem(ptr noundef %attr, i8 %uattr.coerce1)
  define internal fastcc i64 @strncpy_from_bpfptr(ptr noundef %dst, ptr %src.coerce0, i8 %src.coerce1)

  $ nm kernel/bpf/syscall.o | grep map_update_elem
  ...
  000000000000b6b0 t map_update_elem
  $ nm kernel/bpf/syscall.o | grep map_delete_elem
  000000000000bd00 t map_delete_elem
  $ nm kernel/bpf/syscall.o | grep strncpy_from_bpfptr
  0000000000015d90 t strncpy_from_bpfptr
  $

You can see the above function name remains the same in symbol table but actually the number of arguments have changed.

ArgumentPromotion could also change function signature without change function names. This pass is enabled at -O3 or with LTO. Let us say thinLTO is enabled to build the kernel and "-mllvm -debug-only=argpromotion" is added to dump certain transformation info. I hit a case with file tools/lib/subcmd/exec-cmd.c. Before:

  define internal fastcc void @report(ptr nocapture noundef readnone %prefix, ptr nocapture noundef readonly %err, ptr noundef %params)

After:

  define internal fastcc void @report(ptr nocapture noundef readonly %err, ptr noundef %params)

  $ nm exec-cmd.o | grep 'report'
  0000000000000790 t report

gcc has various suffix like .constprop., .part., .isra. to indicate func signature having changed. For clang, looks like PartialInlining is not enabled by default. The above ArgumentPromotion is quite similar to .constprop.. I am not sure whether clang does .isra.-like transformation or not.

If clang optimization changes a function signature, ideally case we could like to have changed signature somewhere (e.g. dwarf) for users easily to do bpf func tracing. But at least we could have a way to encode such signature change information somewhere (dwarf, adding suffix, etc.)?

[1] https://github.com/torvalds/linux [2] https://github.com/torvalds/linux/blob/master/tools/testing/selftests/bpf/progs/test_vmlinux.c#L81-L88

yonghong-song commented 2 months ago

cc @4ast @anakryiko @eddyz87 @jemarch

yonghong-song commented 2 months ago

cc @nikic @david-xl Maybe you can give some suggestions?

nikic commented 2 months ago

If I'm understanding correctly, these are functions that get internalized during LTO? In principle, the correct fix would be to prevent internalization of these symbols.

Is it possible to trace all functions in this way, or are they some subset marked in some way?

david-xl commented 2 months ago

I agree that traced symbols should probably not be optimized with signature change. If signature change happens, it is better to change the function name to avoid confusion. Optimization remarks should also report the changes (if not done).

yonghong-song commented 2 months ago

@nikic, those functions are actually optimized by regular optimizations and some of them are optimized during LTO but by regular optimization passes. The symbol promotion (from static to global) during LTO won't change function signature so it is okay.

yonghong-song commented 2 months ago

@david-xl Right. This is what I suggested. We at least need to change function name (e.g., with a suffix like gcc) to signal that function signature may have changed. Otherwise, users may get incorrect result and is very confused about that.

It would be even better to add such func signature change information to dwarf since dwarf is in kernel binary. Currently kernel llvm build does not have remarks. If dwarf is not possible, let us ensure remarks do have such signature change information.

nikic commented 2 months ago

@nikic, those functions are actually optimized by regular optimizations and some of them are optimized during LTO but by regular optimization passes. The symbol promotion (from static to global) during LTO won't change function signature so it is okay.

The optimization happens in "regular optimizations", but it is only possible because the function is internal, which is what allows LLVM to change the signature. This would not happen if the function were exported. The internalization happens as part of LTO, which is why I'm saying that would be the proper place to fix this.

yonghong-song commented 2 months ago

@nikic, yes, all those functions (changing signatures) are static (internal) functions. Based on above discussion with you and @david-xl, looks like we could modify function name (e.g. with a suffix) to differentiate from original functions. I will look into this.

yonghong-song commented 2 months ago

@david-xl I checked one of kernel file (kernel/bpf/syscall.c), I got the following output with -mllvm -debug-only=deadargelim

DeadArgumentEliminationPass - Removing argument 1 (uattr.coerce0) from map_update_elem
DeadArgumentEliminationPass - Removing argument 1 (uattr.coerce0) from map_delete_elem
DeadArgumentEliminationPass - Removing argument 3 (count) from strncpy_from_bpfptr

I added optiont -foptimization-record-file=t to save remarks. The following is what I got

$ cat t | grep Name | sort | uniq
Name:            AlwaysInline
Name:            DontUnroll
Name:            FullyUnrolled
Name:            Hoisted
Name:            Inlined
Name:            InstructionCount
Name:            InstructionMix
Name:            LoadClobbered
Name:            LoadElim
Name:            LoadWithLoopInvariantAddressCondExecuted
Name:            LoadWithLoopInvariantAddressInvalidated
Name:            LoopSpillReloadCopies
Name:            NeverInline
Name:            NoDefinition
Name:            PartialUnrolled
Name:            PromoteLoopAccessesToScalar
Name:            SpillReloadCopies
Name:            StackSize
Name:            TooCostly

and pass

$ cat t | grep Pass | sort | uniq
Pass:            asm-printer
--- !Passed
Pass:            gvn
Pass:            inline
Pass:            licm
Pass:            loop-unroll
Pass:            prologepilog
Pass:            regalloc
Pass:            TTI

I specifically checked remark file for map_update_elem, map_delete_elem and strncpy_from_bpfptr and there is no deadargelim optimization for these three functions.

Therefore, I think there are no optimization remarks for deadargelim pass.

david-xl commented 2 months ago

An optimization remark can be added here https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/IPO/DeadArgumentElimination.cpp#L768

llvm / llvm-project

Function Signature Change Without Obvious Indication #104678