libbpf / libbpf-bootstrap

Scaffolding for BPF application development with libbpf and BPF CO-RE
BSD 3-Clause "New" or "Revised" License
1.09k stars 294 forks source link

blazesym not giving function address for stacktrace #278

Closed deepakpjose closed 3 months ago

deepakpjose commented 4 months ago

Hi Team,

I've modified profile.c in libbpf-bootstrap to listen to execve and print backtrace. While running it, I can see that kernel stack trace is getting dumped. But blazesym is unable to dump userstack trace. Can some one help me the next steps to know, why symbols are not dumped for user space addresses? Is it some issue with blazesym?

Code for profile.c and profile.bpf.c I compiled can be found in following location. https://github.com/deepakpjose/linux_commands/blob/main/profile.c

Following is the output I'm getting when I dump 'ls' from my docker container.

COMM: bash (pid=5292) @ CPU 10
Kernel:
ffffffffba406e45: __x64_sys_execve @ 0xffffffffba406e40+0x5
ffffffffbb1f2fc0: do_syscall_64 @ 0xffffffffbb1f2f60+0x60
ffffffffbb40012a: entry_SYSCALL_64_after_hwframe @ 0xffffffffbb4000bc+0x6e
Userspace:
00007f4ff744ff3b: <no-symbol>
00005634b4fd8dc2: <no-symbol>
00005634b4fda1fa: <no-symbol>
00005634b4fcce08: <no-symbol>
00005634b4fcfc6e: <no-symbol>
00005634b4fc210c: <no-symbol>
00005634b4fb5efb: <no-symbol>
00007f4ff738b1ca: <no-symbol>
00007f4ff738b28b: <no-symbol>
00005634b4fb6385: <no-symbol>

Environment(inside docker): root@2f51403077e9:/src/libbpf-bootstrap/examples/c# uname -a Linux 2f51403077e9 6.6.22-linuxkit #1 SMP PREEMPT_DYNAMIC Fri Mar 29 12:23:08 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Thanks, Deepak

danielocfb commented 4 months ago

In general blaze_err_str(blaze_err_last()) may help for understanding symbolization failures or you may want to look at the reason if individual addresses failed to symbolize. In this case, I am having trouble understanding how the code you referenced can match what you pasted output for, as there is no <no-symbol> or some kind of @ construct after the user space addresses, which strikes me as impossible (though it's possible I missed something).

deepakpjose commented 4 months ago

I've modified it. It's there, but somehow missed out.

deepakpjose commented 4 months ago

Hi Daniel,

I've added blaze_err_str(blaze_err_last()) and the error was 'entity not found'. Looks like userspace address passed is not correct. I've to add any other settings to get proper address? Searches in different portals suggested to set 'sudo setcap -r /usr/local/bin/your_bpf_program'. But that didn't help.

COMM: bash (pid=3290) @ CPU 5
Kernel:
ffffffffa8406e45: __x64_sys_execve @ 0xffffffffa8406e40+0x5
ffffffffa91f2fc0: do_syscall_64 @ 0xffffffffa91f2f60+0x60
ffffffffa940012a: entry_SYSCALL_64_after_hwframe @ 0xffffffffa94000bc+0x6e
Userspace:
00007ff5f6e7ef3b: <no-symbol> blaze_error: entity not found
000055f4fa7cadc2: <no-symbol> blaze_error: entity not found
000055f4fa7cc1fa: <no-symbol> blaze_error: entity not found
000055f4fa7bee08: <no-symbol> blaze_error: entity not found
000055f4fa7c1c6e: <no-symbol> blaze_error: entity not found
000055f4fa7b410c: <no-symbol> blaze_error: entity not found
000055f4fa7a7efb: <no-symbol> blaze_error: entity not found
00007ff5f6dba1ca: <no-symbol> blaze_error: entity not found
00007ff5f6dba28b: <no-symbol> blaze_error: entity not found
000055f4fa7a8385: <no-symbol> blaze_error: entity not found

Thanks, Deepak

danielocfb commented 4 months ago

You can take the PID and addresses and feed them to blazecli. It should be able to tell you more where the error originates and you can enable traces via -vvv.

danielocfb commented 4 months ago

Binaries are available here for example.

deepakpjose commented 3 months ago

Daniel, Thanks for the input in debugging this. I felt blazecli is unable to resolve symbols for addresses properly. ebpf is generating addressses properly because when I look the detailed memory map for the user process, it is giving an address in the range for text region.

Follwoing is the stack trace for user and kernel I got for one example.

root@975ef1935d58:/src/libbpf-bootstrap/build# ./profile
COMM: hello_world (pid=9865) @ CPU 2
Kernel:
ffffffffa8171a8e: queue_work_on @ 0xffffffffa8171a70+0x1e
ffffffffa8b97eaf: tty_insert_flip_string_and_push_buffer @ 0xffffffffa8b97e20+0x8f
ffffffffa8b9acbe: pty_write @ 0xffffffffa8b9ac90+0x2e
ffffffffa8b92372: n_tty_write @ 0xffffffffa8b92210+0x162
ffffffffa8b8daf5: file_tty_write.isra.0 @ 0xffffffffa8b8d970+0x185
ffffffffa83fc8fd: vfs_write @ 0xffffffffa83fc6d0+0x22d
ffffffffa83fceaf: ksys_write @ 0xffffffffa83fce40+0x6f
ffffffffa91f2fc0: do_syscall_64 @ 0xffffffffa91f2f60+0x60
ffffffffa940012a: entry_SYSCALL_64_after_hwframe @ 0xffffffffa94000bc+0x6e
Userspace:
00007ffff7ec2574: <no-symbol>
00007ffff7e38561: <no-symbol>
00007ffff7e39a9f: <no-symbol>
00007ffff7e06cb9: <no-symbol>
00007ffff7e11733: <no-symbol>
00007ffff7e061a3: <no-symbol>
000055555555516a: <no-symbol>
00005555555551b4: <no-symbol>
00007ffff7dd01ca: <no-symbol>
00007ffff7dd028b: <no-symbol>
0000555555555075: <no-symbol>

Output of blazecli is as follows with traces using -vvv. For all addresses belongs to hello_world program, blazecli always dumps __TMC_END__ as the symbol.

root@975ef1935d58:/src/libbpf-bootstrap/blazesym# ./target/debug/blazecli symbolize elf --path /src/libbpf-bootstrap/C/hello_world 00005555555551b4
 -vvv
2024-07-06T09:25:43.140286Z  INFO symbolize: new src=Elf("/src/libbpf-bootstrap/C/hello_world") addrs=VirtOffset([0x5555555551b4])
2024-07-06T09:25:43.140451Z  INFO symbolize: enter src=Elf("/src/libbpf-bootstrap/C/hello_world") addrs=VirtOffset([0x5555555551b4])
2024-07-06T09:25:43.141082Z  INFO symbolize:symbolize_with_resolver: new src=Elf("/src/libbpf-bootstrap/C/hello_world") addrs=VirtOffset([0x5555555
551b4]) addr=0x5555555551b4 resolver=Cached(DWARF /src/libbpf-bootstrap/C/hello_world)
2024-07-06T09:25:43.141315Z  INFO symbolize:symbolize_with_resolver: enter src=Elf("/src/libbpf-bootstrap/C/hello_world") addrs=VirtOffset([0x55555
55551b4]) addr=0x5555555551b4 resolver=Cached(DWARF /src/libbpf-bootstrap/C/hello_world)
2024-07-06T09:25:43.141474Z  INFO symbolize:symbolize_with_resolver:find_sym: new src=Elf("/src/libbpf-bootstrap/C/hello_world") addrs=VirtOffset([
0x5555555551b4]) addr=0x5555555551b4 resolver=Cached(DWARF /src/libbpf-bootstrap/C/hello_world) self=DWARF /src/libbpf-bootstrap/C/hello_world opts
=CodeInfoAndInlined addr=0x5555555551b4
2024-07-06T09:25:43.141593Z  INFO symbolize:symbolize_with_resolver:find_sym: enter src=Elf("/src/libbpf-bootstrap/C/hello_world") addrs=VirtOffset
([0x5555555551b4]) addr=0x5555555551b4 resolver=Cached(DWARF /src/libbpf-bootstrap/C/hello_world) self=DWARF /src/libbpf-bootstrap/C/hello_world op
ts=CodeInfoAndInlined addr=0x5555555551b4
2024-07-06T09:25:43.141833Z  INFO symbolize:symbolize_with_resolver:find_sym: exit src=Elf("/src/libbpf-bootstrap/C/hello_world") addrs=VirtOffset(
[0x5555555551b4]) addr=0x5555555551b4 resolver=Cached(DWARF /src/libbpf-bootstrap/C/hello_world) self=DWARF /src/libbpf-bootstrap/C/hello_world opt
s=CodeInfoAndInlined addr=0x5555555551b4
2024-07-06T09:25:43.141938Z  INFO symbolize:symbolize_with_resolver:find_sym: close time.busy=240µs time.idle=224µs src=Elf("/src/libbpf-bootstrap/
C/hello_world") addrs=VirtOffset([0x5555555551b4]) addr=0x5555555551b4 resolver=Cached(DWARF /src/libbpf-bootstrap/C/hello_world) self=DWARF /src/l
ibbpf-bootstrap/C/hello_world opts=CodeInfoAndInlined addr=0x5555555551b4
2024-07-06T09:25:43.142076Z  INFO symbolize:symbolize_with_resolver: exit src=Elf("/src/libbpf-bootstrap/C/hello_world") addrs=VirtOffset([0x555555
5551b4]) addr=0x5555555551b4 resolver=Cached(DWARF /src/libbpf-bootstrap/C/hello_world)
2024-07-06T09:25:43.142166Z  INFO symbolize:symbolize_with_resolver: close time.busy=763µs time.idle=323µs src=Elf("/src/libbpf-bootstrap/C/hello_w
orld") addrs=VirtOffset([0x5555555551b4]) addr=0x5555555551b4 resolver=Cached(DWARF /src/libbpf-bootstrap/C/hello_world)
2024-07-06T09:25:43.142285Z  INFO symbolize: exit src=Elf("/src/libbpf-bootstrap/C/hello_world") addrs=VirtOffset([0x5555555551b4])
2024-07-06T09:25:43.142423Z  INFO symbolize: close time.busy=1.83ms time.idle=307µs src=Elf("/src/libbpf-bootstrap/C/hello_world") addrs=VirtOffset
([0x5555555551b4])
0x005555555551b4: __TMC_END__ @ 0x4018+0x55555555119c

Symbols output for functions in readelf programs are as follows.

root@975ef1935d58:/src/libbpf-bootstrap/blazesym# readelf -s /src/libbpf-bootstrap/C/hello_world | grep FUNC
     1: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND _[...]@GLIBC_2.34 (2)
     3: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND [...]@GLIBC_2.2.5 (3)
     6: 0000000000000000     0 FUNC    WEAK   DEFAULT  UND [...]@GLIBC_2.2.5 (3)
     4: 0000000000001080     0 FUNC    LOCAL  DEFAULT   15 deregister_tm_clones
     5: 00000000000010b0     0 FUNC    LOCAL  DEFAULT   15 register_tm_clones
     6: 00000000000010f0     0 FUNC    LOCAL  DEFAULT   15 __do_global_dtors_aux
     9: 0000000000001130     0 FUNC    LOCAL  DEFAULT   15 frame_dummy
    18: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __libc_start_mai[...]
    22: 00000000000011bc     0 FUNC    GLOBAL HIDDEN    16 _fini
    23: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND printf@GLIBC_2.2.5
    29: 0000000000001050    38 FUNC    GLOBAL DEFAULT   15 _start
    30: 0000000000001140    84 FUNC    GLOBAL DEFAULT   15 print_hello
    32: 00000000000011a0    28 FUNC    GLOBAL DEFAULT   15 main
    35: 0000000000000000     0 FUNC    WEAK   DEFAULT  UND __cxa_finalize@G[...]
    36: 0000000000001000     0 FUNC    GLOBAL HIDDEN    12 _init

Detailed map of memory regions are as follows for my process that is profiled using ebpf.

root@975ef1935d58:/src/libbpf-bootstrap/blazesym# pidof hello_world
4139
root@975ef1935d58:/src/libbpf-bootstrap/blazesym# cat /proc/4139/maps
555555554000-555555555000 r--p 00000000 fe:01 2284602                    /src/libbpf-bootstrap/C/hello_world
555555555000-555555556000 r-xp 00001000 fe:01 2284602                    /src/libbpf-bootstrap/C/hello_world
555555556000-555555557000 r--p 00002000 fe:01 2284602                    /src/libbpf-bootstrap/C/hello_world
555555557000-555555558000 r--p 00002000 fe:01 2284602                    /src/libbpf-bootstrap/C/hello_world
555555558000-555555559000 rw-p 00003000 fe:01 2284602                    /src/libbpf-bootstrap/C/hello_world
555555559000-55555557a000 rw-p 00000000 00:00 0                          [heap]
7ffff7da3000-7ffff7da6000 rw-p 00000000 00:00 0
7ffff7da6000-7ffff7dce000 r--p 00000000 fe:01 2011576                    /usr/lib/x86_64-linux-gnu/libc.so.6
7ffff7dce000-7ffff7f56000 r-xp 00028000 fe:01 2011576                    /usr/lib/x86_64-linux-gnu/libc.so.6
7ffff7f56000-7ffff7fa5000 r--p 001b0000 fe:01 2011576                    /usr/lib/x86_64-linux-gnu/libc.so.6
7ffff7fa5000-7ffff7fa9000 r--p 001fe000 fe:01 2011576                    /usr/lib/x86_64-linux-gnu/libc.so.6
7ffff7fa9000-7ffff7fab000 rw-p 00202000 fe:01 2011576                    /usr/lib/x86_64-linux-gnu/libc.so.6
7ffff7fab000-7ffff7fb8000 rw-p 00000000 00:00 0
7ffff7fbd000-7ffff7fbf000 rw-p 00000000 00:00 0
7ffff7fbf000-7ffff7fc3000 r--p 00000000 00:00 0                          [vvar]
7ffff7fc3000-7ffff7fc5000 r-xp 00000000 00:00 0                          [vdso]
7ffff7fc5000-7ffff7fc6000 r--p 00000000 fe:01 2011556                    /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
7ffff7fc6000-7ffff7ff1000 r-xp 00001000 fe:01 2011556                    /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
7ffff7ff1000-7ffff7ffb000 r--p 0002c000 fe:01 2011556                    /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
7ffff7ffb000-7ffff7ffd000 r--p 00036000 fe:01 2011556                    /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
7ffff7ffd000-7ffff7fff000 rw-p 00038000 fe:01 2011556                    /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
7ffffffde000-7ffffffff000 rw-p 00000000 00:00 0                          [stack]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

Thanks, Deepak

danielocfb commented 3 months ago

You will have to use process symbolization:

$ ./target/debug/blazecli symbolize process --pid 4139 00005555555551b4 -vvv

(or something like that)

Can you please attach the hello_world binary?

deepakpjose commented 3 months ago

Here is my binary file for hello_world. https://github.com/deepakpjose/linux_commands/blob/main/hello_world.bin

process symbolization didn't work for me.

root@90a68567a85f:/src/libbpf-bootstrap/blazesym# ./target/debug/blazecli symbolize process --pid 51 00005555555551b4 -vvv
2024-07-08T17:11:00.881622Z  INFO symbolize: new src=Process(51) addrs=AbsAddr([0x5555555551b4])
2024-07-08T17:11:00.881669Z  INFO symbolize: enter src=Process(51) addrs=AbsAddr([0x5555555551b4])
2024-07-08T17:11:00.882945Z  INFO symbolize:handle_unknown_addr: new src=Process(51) addrs=AbsAddr([0x5555555551b4]) addr=0x5555555551b4
2024-07-08T17:11:00.883088Z  INFO symbolize:handle_unknown_addr: enter src=Process(51) addrs=AbsAddr([0x5555555551b4]) addr=0x5555555551b4
2024-07-08T17:11:00.883174Z  INFO symbolize:handle_unknown_addr: exit src=Process(51) addrs=AbsAddr([0x5555555551b4]) addr=0x5555555551b4
2024-07-08T17:11:00.883210Z  INFO symbolize:handle_unknown_addr: close time.busy=86.4µs time.idle=182µs src=Process(51) addrs=AbsAddr([0x5555555551b4]) addr=0
x5555555551b4
2024-07-08T17:11:00.883299Z  INFO symbolize: exit src=Process(51) addrs=AbsAddr([0x5555555551b4])
2024-07-08T17:11:00.883348Z  INFO symbolize: close time.busy=1.63ms time.idle=111µs src=Process(51) addrs=AbsAddr([0x5555555551b4])
0x005555555551b4: <no-symbol>

If I give absolute address, blazecli is dumping address properly. For example, in the read_elf from previous comment, hello_world address is 0000000000001140. So following cli works.

root@90a68567a85f:/src/libbpf-bootstrap/blazesym# ./target/debug/blazecli symbolize elf --path /src/libbpf-bootstrap/C/hello_world 0000000000001140
0x00000000001140: print_hello @ 0x1140+0x0 hello_world.c:4

My hello_world program looks like as follows.

  1 #include <stdio.h>
  2
  3 void print_hello(void)
  4 {
  5     for (int i=0; i<100; i++) {
  6             printf("Hello, World! = %d\n", i);
  7             if (i == 99) {
  8                 i = 0;
  9             }
 10     }
 11 }
 12 int main() {
 13     print_hello();
 14     return 0;
 15 }

Thanks, Deepak

danielocfb commented 3 months ago

process symbolization didn't work for me.

root@90a68567a85f:/src/libbpf-bootstrap/blazesym# ./target/debug/blazecli symbolize process --pid 51 00005555555551b4 -vvv
2024-07-08T17:11:00.881622Z  INFO symbolize: new src=Process(51) addrs=AbsAddr([0x5555555551b4])
2024-07-08T17:11:00.881669Z  INFO symbolize: enter src=Process(51) addrs=AbsAddr([0x5555555551b4])
2024-07-08T17:11:00.882945Z  INFO symbolize:handle_unknown_addr: new src=Process(51) addrs=AbsAddr([0x5555555551b4]) addr=0x5555555551b4
2024-07-08T17:11:00.883088Z  INFO symbolize:handle_unknown_addr: enter src=Process(51) addrs=AbsAddr([0x5555555551b4]) addr=0x5555555551b4
2024-07-08T17:11:00.883174Z  INFO symbolize:handle_unknown_addr: exit src=Process(51) addrs=AbsAddr([0x5555555551b4]) addr=0x5555555551b4
2024-07-08T17:11:00.883210Z  INFO symbolize:handle_unknown_addr: close time.busy=86.4µs time.idle=182µs src=Process(51) addrs=AbsAddr([0x5555555551b4]) addr=0
x5555555551b4
2024-07-08T17:11:00.883299Z  INFO symbolize: exit src=Process(51) addrs=AbsAddr([0x5555555551b4])
2024-07-08T17:11:00.883348Z  INFO symbolize: close time.busy=1.63ms time.idle=111µs src=Process(51) addrs=AbsAddr([0x5555555551b4])
0x005555555551b4: <no-symbol>

Okay, but did you check whether the address actually exists for this process? You have address space randomization enabled after all.

If I symbolize your binary I seem to get the expected result:

$ cargo run -p blazecli -- symbolize process --pid 87803 0x55a4864a01b4
> 0x0055a4864a01b4: main @ 0x11a0+0x14 hello_world.c:14:5
deepakpjose commented 3 months ago

Using the cli you have shared, it's working for me. Thanks for that.

By any chance, bpf_get_current_pid_tgid can give wrong process id? I'm using the profile.bpf.c from libpf-bootstrap in my docker, but it's giving wrong process id. If we see below, process 'hello_world' pid is 1606. But user process profile.c is getting 2581. Probably blazesym failed when called blaze_symbolize_process_abs_addrs is because pid passed from profile.bpf.c is wrong.

root@d07ed6c244d8:/src/libbpf-bootstrap/build# pidof hello_world
1606
root@d07ed6c244d8:/src/libbpf-bootstrap/build# ./profile
COMM: hello_world (pid=2581) @ CPU 1
Kernel:
ffffffffa9a1994b: copyin @ 0xffffffffa9a19930+0x1b
ffffffffa9a1b39b: _copy_from_iter @ 0xffffffffa9a1b340+0x5b
ffffffffa9b8dace: file_tty_write.isra.0 @ 0xffffffffa9b8d970+0x15e
ffffffffa93fc8fd: vfs_write @ 0xffffffffa93fc6d0+0x22d
ffffffffa93fceaf: ksys_write @ 0xffffffffa93fce40+0x6f
ffffffffaa1f2fc0: do_syscall_64 @ 0xffffffffaa1f2f60+0x60
ffffffffaa40012a: entry_SYSCALL_64_after_hwframe @ 0xffffffffaa4000bc+0x6e
Userspace:
00007ffff7ec2574: <no-symbol>
00007ffff7e38561: <no-symbol>
00007ffff7e39a9f: <no-symbol>
00007ffff7e06cb9: <no-symbol>
00007ffff7e11733: <no-symbol>
00007ffff7e061a3: <no-symbol>
000055555555516a: <no-symbol>
00005555555551b4: <no-symbol>
00007ffff7dd01ca: <no-symbol>
00007ffff7dd028b: <no-symbol>
0000555555555075: <no-symbol>

Snippet from file profile.bpf.c

SEC("ksyscall/execve")
int profile(void *ctx)
{
    int pid = bpf_get_current_pid_tgid() >> 32;
    int cpu_id = bpf_get_smp_processor_id();
    struct stacktrace_event *event;
    int cp;

    event = bpf_ringbuf_reserve(&events, sizeof(*event), 0);
    if (!event)
        return 1;

    event->pid = pid;
    event->cpu_id = cpu_id;

Also, can you teach me, how you decoded binary and found about ASLR? Like to learn tricks of parsing the binary.

Thanks, Deepak

danielocfb commented 3 months ago

By any chance, bpf_get_current_pid_tgid can give wrong process id?

I doubt it will be wrong. It's just returning some kernel task struct member.

I'm using the profile.bpf.c from libpf-bootstrap in my docker, but it's giving wrong process id. If we see below, process 'hello_world' pid is 1606. But user process profile.c is getting 2581.

What I could imagine happening is that Docker is setting up a PID namespace. As far as I understand the PID reported by BPF will be the kernel's "global" one.

Are you invoking profile from within the container? I think that could explain the issue. Then, the program by design would not have access to the global PID namespace but the global PID is still being reported by BPF. Yet, this global PID has no meaning within the namespaced container.

profile, being a system-wide profiler, shouldn't be invoked in such a namespace. I suppose we may want to guard against that to prevent confusion.

Also, can you teach me, how you decoded binary and found about ASLR? Like to learn tricks of parsing the binary.

I just noticed that addresses were different between runs by looking at the corresponding maps file contents. What I did then was take the 0x5555555551b4, which maps to the

555555555000-555555556000 r-xp 00001000 fe:01 2284602                    /src/libbpf-bootstrap/C/hello_world

entry you pasted. So then I subtracted 0x555555555000 from it. That's 0x1b4. Then, on my running instance of hello_world I searched for the equivalent maps entry for the same segment, which started at 0x0055a4864a0000. So I added the 0x1b4 offset to it and used that as input for blazecli. It's not strictly speaking the correct algorithm, but it works most of the time.

deepakpjose commented 3 months ago

I am running profile ebpf process from docker. Are you saying, ebpf is reporting process id from macos(host os) Point of view?

In place of perf_on_event, if trace some other system call, pid is going to be same?

danielocfb commented 3 months ago

I am running profile ebpf process from docker. Are you saying, ebpf is reporting process id from macos(host os) Point of view?

Yep.

In place of perf_on_event, if trace some other system call, pid is going to be same?

I don't know.

deepakpjose commented 3 months ago

Thanks a lot for support. It was a wonderful learning. I'll close this request.