open-telemetry / opentelemetry-ebpf-profiler

The production-scale datacenter profiler (C/C++, Go, Rust, Python, Java, NodeJS, .NET, PHP, Ruby, Perl, ...)
Apache License 2.0
2.31k stars 244 forks source link

failed to load unwind_dotnet when make with debug #141

Closed tsint closed 2 weeks ago

tsint commented 2 weeks ago

My application runtime environment is Mint 21.3 (equivalent to Ubuntu 22.04). Below is the output when running opentelemetry-ebpf-profiler after compiling the BPF code using make debug.

dc@mint213:~/opensource/github/otel-profiling-agent/support/ebpf$ make debug
clang-16 -target x86_64-linux-gnu -DOPTI_DEBUG -g -fno-jump-tables -nostdlib -nostdinc -ffreestanding -O2 -emit-llvm -c dotnet_tracer.ebpf.c -Wall -Wextra -Werror -Wno-address-of-packed-member -Wno-unused-label -Wno-unused-parameter -Wno-sign-compare -fno-stack-protector -o dotnet_tracer.ebpf.amd64.o
clang-16 -target x86_64-linux-gnu -DOPTI_DEBUG -g -fno-jump-tables -nostdlib -nostdinc -ffreestanding -O2 -emit-llvm -c hotspot_tracer.ebpf.c -Wall -Wextra -Werror -Wno-address-of-packed-member -Wno-unused-label -Wno-unused-parameter -Wno-sign-compare -fno-stack-protector -o hotspot_tracer.ebpf.amd64.o
clang-16 -target x86_64-linux-gnu -DOPTI_DEBUG -g -fno-jump-tables -nostdlib -nostdinc -ffreestanding -O2 -emit-llvm -c integration_test.ebpf.c -Wall -Wextra -Werror -Wno-address-of-packed-member -Wno-unused-label -Wno-unused-parameter -Wno-sign-compare -fno-stack-protector -o integration_test.ebpf.amd64.o
clang-16 -target x86_64-linux-gnu -DOPTI_DEBUG -g -fno-jump-tables -nostdlib -nostdinc -ffreestanding -O2 -emit-llvm -c interpreter_dispatcher.ebpf.c -Wall -Wextra -Werror -Wno-address-of-packed-member -Wno-unused-label -Wno-unused-parameter -Wno-sign-compare -fno-stack-protector -o interpreter_dispatcher.ebpf.amd64.o
clang-16 -target x86_64-linux-gnu -DOPTI_DEBUG -g -fno-jump-tables -nostdlib -nostdinc -ffreestanding -O2 -emit-llvm -c native_stack_trace.ebpf.c -Wall -Wextra -Werror -Wno-address-of-packed-member -Wno-unused-label -Wno-unused-parameter -Wno-sign-compare -fno-stack-protector -o native_stack_trace.ebpf.amd64.o
clang-16 -target x86_64-linux-gnu -DOPTI_DEBUG -g -fno-jump-tables -nostdlib -nostdinc -ffreestanding -O2 -emit-llvm -c perl_tracer.ebpf.c -Wall -Wextra -Werror -Wno-address-of-packed-member -Wno-unused-label -Wno-unused-parameter -Wno-sign-compare -fno-stack-protector -o perl_tracer.ebpf.amd64.o
clang-16 -target x86_64-linux-gnu -DOPTI_DEBUG -g -fno-jump-tables -nostdlib -nostdinc -ffreestanding -O2 -emit-llvm -c php_tracer.ebpf.c -Wall -Wextra -Werror -Wno-address-of-packed-member -Wno-unused-label -Wno-unused-parameter -Wno-sign-compare -fno-stack-protector -o php_tracer.ebpf.amd64.o
clang-16 -target x86_64-linux-gnu -DOPTI_DEBUG -g -fno-jump-tables -nostdlib -nostdinc -ffreestanding -O2 -emit-llvm -c python_tracer.ebpf.c -Wall -Wextra -Werror -Wno-address-of-packed-member -Wno-unused-label -Wno-unused-parameter -Wno-sign-compare -fno-stack-protector -o python_tracer.ebpf.amd64.o
clang-16 -target x86_64-linux-gnu -DOPTI_DEBUG -g -fno-jump-tables -nostdlib -nostdinc -ffreestanding -O2 -emit-llvm -c ruby_tracer.ebpf.c -Wall -Wextra -Werror -Wno-address-of-packed-member -Wno-unused-label -Wno-unused-parameter -Wno-sign-compare -fno-stack-protector -o ruby_tracer.ebpf.amd64.o
clang-16 -target x86_64-linux-gnu -DOPTI_DEBUG -g -fno-jump-tables -nostdlib -nostdinc -ffreestanding -O2 -emit-llvm -c sched_monitor.ebpf.c -Wall -Wextra -Werror -Wno-address-of-packed-member -Wno-unused-label -Wno-unused-parameter -Wno-sign-compare -fno-stack-protector -o sched_monitor.ebpf.amd64.o
clang-16 -target x86_64-linux-gnu -DOPTI_DEBUG -g -fno-jump-tables -nostdlib -nostdinc -ffreestanding -O2 -emit-llvm -c system_config.ebpf.c -Wall -Wextra -Werror -Wno-address-of-packed-member -Wno-unused-label -Wno-unused-parameter -Wno-sign-compare -fno-stack-protector -o system_config.ebpf.amd64.o
clang-16 -target x86_64-linux-gnu -DOPTI_DEBUG -g -fno-jump-tables -nostdlib -nostdinc -ffreestanding -O2 -emit-llvm -c v8_tracer.ebpf.c -Wall -Wextra -Werror -Wno-address-of-packed-member -Wno-unused-label -Wno-unused-parameter -Wno-sign-compare -fno-stack-protector -o v8_tracer.ebpf.amd64.o
llvm-link-16 dotnet_tracer.ebpf.amd64.o hotspot_tracer.ebpf.amd64.o integration_test.ebpf.amd64.o interpreter_dispatcher.ebpf.amd64.o native_stack_trace.ebpf.amd64.o perl_tracer.ebpf.amd64.o php_tracer.ebpf.amd64.o python_tracer.ebpf.amd64.o ruby_tracer.ebpf.amd64.o sched_monitor.ebpf.amd64.o system_config.ebpf.amd64.o v8_tracer.ebpf.amd64.o -o - | llc-16 -march=bpf -mcpu=v2 -filetype=obj -o tracer.ebpf.amd64
/usr/lib/llvm/bin/llvm-objdump

Instruction counts for tracer.ebpf.amd64:

.text has 0 instructions
perf_event/unwind_dotnet has 7126 instructions
perf_event/unwind_hotspot has 6503 instructions
tracepoint/sched/sched_switch has 1160 instructions
tracepoint/syscalls/sys_enter_read has 22 instructions
perf_event/unwind_stop has 1078 instructions
perf_event/native_tracer_entry has 1044 instructions
perf_event/unwind_native has 6982 instructions
perf_event/unwind_perl has 6646 instructions
perf_event/unwind_php has 6557 instructions
perf_event/unwind_python has 5567 instructions
perf_event/unwind_ruby has 5004 instructions
tracepoint/sched/sched_process_exit has 253 instructions
tracepoint/syscalls/sys_enter_bpf has 41 instructions
raw_tracepoint/sys_enter has 52 instructions
perf_event/unwind_v8 has 7110 instructions

Total instructions: 55145

dc@mint213:~/opensource/github/otel-profiling-agent/support/ebpf$ cd ../../
dc@mint213:~/opensource/github/otel-profiling-agent$ make
go generate ./...
make -j4 -C support/ebpf
make[1]: Entering directory '/home/dc/opensource/github/otel-profiling-agent/support/ebpf'
make[1]: Nothing to be done for 'all'.
make[1]: Leaving directory '/home/dc/opensource/github/otel-profiling-agent/support/ebpf'
go build -buildvcs=false -ldflags="-X github.com//open-telemetry/opentelemetry-ebpf-profiler/vc.version=v0.0.0 -X github.com/open-telemetry/opentelemetry-ebpf-profiler/vc.revision=main-dd0c2070 -X github.com/open-telemetry/opentelemetry-ebpf-profiler/vc.buildTimestamp=1724931470 -extldflags=-static" -tags osusergo,netgo
dc@mint213:~/opensource/github/otel-profiling-agent$ sudo ./opentelemetry-ebpf-profiler  -collection-agent=127.0.0.1:11000 -disable-tls
INFO[0000] Starting OTEL profiling agent  (revision main-dd0c2070, build timestamp 1724931164)
INFO[0000] Interpreter tracers: perl,php,python,hotspot,ruby,v8,dotnet
ERRO[0000] load program: permission denied: 6082: (85) call bpf_probe_read_user#112: R1 unbounded memory access, make sure to bounds check any such access (truncated, 17 line(s) omitted)
ERRO[0000] Failed to load eBPF tracer: failed to load eBPF code: failed to load eBPF programs: failed to load unwind_dotnet

After disassembling tracer.ebpf.amd64 with llvm-objdump, it can be seen that the error occurred at line 74 of dotnet_tracer.ebpf.c, where the code did not pass the verifier check.

; /home/dc/opensource/github/otel-profiling-agent/support/ebpf/dotnet_tracer.ebpf.c:69┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃
    6073:       b7 02 00 00 7f 00 00 00 r2 = 0x7f                           ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃
    6074:       1f 12 00 00 00 00 00 00 r2 -= r1                            ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃
    6075:       67 02 00 00 20 00 00 00 r2 <<= 0x20                         ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃
    6076:       77 02 00 00 20 00 00 00 r2 >>= 0x20                         ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┻

000000000000bde8 <LBB0_559>:
; LBB0_559():
; /home/dc/opensource/github/otel-profiling-agent/support/ebpf/dotnet_tracer.ebpf.c:74┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃
;   if (bpf_probe_read_user(&scratch->map[offs], sizeof(scratch->map), (void*) map_start)) {┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃
    6077:       67 02 00 00 02 00 00 00 r2 <<= 0x2                          ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃
; LBB6_1005():
; /home/dc/opensource/github/otel-profiling-agent/support/ebpf/dotnet_tracer.ebpf.c:74┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃
    6078:       79 a1 60 ff 00 00 00 00 r1 = *(u64 *)(r10 - 0xa0)           ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃
    6079:       0f 21 00 00 00 00 00 00 r1 += r2                            ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃
; LBB14_683():
; /home/dc/opensource/github/otel-profiling-agent/support/ebpf/dotnet_tracer.ebpf.c:74┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃
    6080:       b7 02 00 00 00 02 00 00 r2 = 0x200                          ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃
; LBB7_550():
; /home/dc/opensource/github/otel-profiling-agent/support/ebpf/dotnet_tracer.ebpf.c:74┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃
    6081:       bf 63 00 00 00 00 00 00 r3 = r6                             ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃
    6082:       85 00 00 00 70 00 00 00 call 0x70                           ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃

I'm a bit confused about this part of the code(https://github.com/open-telemetry/opentelemetry-ebpf-profiler/blob/main/support/ebpf/dotnet_tracer.ebpf.c#L65-L76). If pc_delta = DOTNET_CODE_BYTES_PER_ENTRY, then offs = map_elements - 2. Wouldn't this cause a buffer overflow in bpf_probe_read_user(&scratch->map[offs], sizeof(scratch->map), (void*) map_start)?

  // Read the nibble map data
  int offs = 0;
  if (pc_delta < (map_elements-2)*DOTNET_CODE_BYTES_PER_ENTRY) {
    // Read from map_start so that end of scratch->map corresponds to pc_delta
    offs = map_elements - pc_delta/DOTNET_CODE_BYTES_PER_ENTRY - 1;
  } else {
    // We can read full scratch buffer, adjust map_start so that last entry read corresponds pc_delta
    map_start += pc_delta/DOTNET_CODE_BYTES_PER_ENTRY*sizeof(u32) - sizeof(scratch->map) + sizeof(u32);
  }
  if (bpf_probe_read_user(&scratch->map[offs], sizeof(scratch->map), (void*) map_start)) {
    goto bad_code_header;
  }

Thanks for any answers.

fabled commented 2 weeks ago

I'm a bit confused about this part of the code(https://github.com/open-telemetry/opentelemetry-ebpf-profiler/blob/main/support/ebpf/dotnet_tracer.ebpf.c#L65-L76). If pc_delta = DOTNET_CODE_BYTES_PER_ENTRY, then offs = map_elements - 2. Wouldn't this cause a buffer overflow in bpf_probe_read_user(&scratch->map[offs], sizeof(scratch->map), (void*) map_start)?

This is explained in the commet of the map definition at: https://github.com/open-telemetry/opentelemetry-ebpf-profiler/blob/main/support/ebpf/types.h#L615-L620

fabled commented 2 weeks ago

Does commit 8254100 fix the issue at hand?

tsint commented 2 weeks ago

@fabled Thank you for your answer. I just tested this patch, and it successfully loaded the BPF code.