ianlancetaylor / libbacktrace

A C library that may be linked into a C/C++ program to produce symbolic backtraces
Other
944 stars 220 forks source link

No backtrace in signal handler with musl #80

Closed bszente closed 2 years ago

bszente commented 2 years ago

Please consider the following test code:

#include <stdlib.h>
#include <stdio.h>

#include <signal.h>

#include "backtrace.h"
#include "backtrace-supported.h"

typedef void (*func_ptr)(void);

struct backtrace_state *bstate;

int bt_cb_simple(void *vdata, uintptr_t pc)
{
    printf("# 0x%jx\n", (uintmax_t)pc);
    return 0;
}

void crash_handler(int sig, siginfo_t *info, void *ucontext) {
    int ret;

    if (sig == SIGSEGV) {
        printf("SIGSEGV: addr=%p code=%d\n", info->si_addr, info->si_code);

        ret = backtrace_simple(bstate, 0, bt_cb_simple, NULL, NULL);
        printf("ret=%d\n", ret);
    }

    exit(EXIT_SUCCESS);
}

void install_handler() {
    struct sigaction act = {0};

    sigemptyset(&act.sa_mask);
    act.sa_sigaction = &crash_handler;
    act.sa_flags = SA_SIGINFO;

    if (sigaction(SIGSEGV, &act, NULL) < 0)
        printf("Failed to install signal handler\n");
}

void do_invalid_access(int *v) {
    printf("Inside do_invalid_access\n");
    /* Force an invalid pointer access */
    *v = *v + 1;
}

void function2(void) {
    printf("Inside function2\n");
    do_invalid_access(NULL);
}

void function1(void) {
    printf("Inside function1\n");
    function2();
}

int main(int argc, char* argv[]) {
    bstate = backtrace_create_state(argv[0], BACKTRACE_SUPPORTS_THREADS, NULL, NULL);
    if (!bstate)
        return 1;

    printf("Inside main\n");
    install_handler();
    function1();

    return 0;
}

For the sake of example, please ignore that printf is not safe to be called from a signal handler.

Executing the following steps:

  1. Compile and link fully static the test.c file with a musl based toolchain built using Buildroot 2021.02.2 with debug symbols enabled:

    $ x86_64-buildroot-linux-musl-gcc -static test.c -o test -g2 -lbacktrace

  2. Run the program:

    $ ./test Inside main Inside function1 Inside function2 Inside do_invalid_access SIGSEGV: addr=0 code=1

    0x4011d2

    0x4062a9

    ret=0

  3. Decode the addresses:

    $ x86_64-buildroot-linux-musl-addr2line -aipfC -e ./test 0x4011d2 0x4062a9 0x00000000004011d2: crash_handler at /home/user/libbacktrace-test/test.c:27 0x00000000004062a9: sigemptyset at /home/user/build-x86_64-2021.02.2/build/musl-1.2.2/src/signal/x86_64/restore.s:1

    As it can be seen, the backtrace stops in the crash_handler. There are no addresses above the signal frame.

  4. The very same binary in GDB has the following callstack in that point:

    Inside main Inside function1 Inside function2 Inside do_invalid_access

    Program received signal SIGSEGV, Segmentation fault. 0x000000000040127f in do_invalid_access (v=0x0) at test.c:48 48 v = v + 1; (gdb) cont Continuing. SIGSEGV: addr=0 code=1

    Breakpoint 1, crash_handler (sig=11, info=0x7fffffffd130, ucontext=0x7fffffffd000) at test.c:27 27 ret = backtrace_simple(bstate, 0, bt_cb_simple, NULL, NULL); (gdb) bt

    0 crash_handler (sig=11, info=0x7fffffffd130, ucontext=0x7fffffffd000) at test.c:27

    1

    2 0x000000000040127f in do_invalid_access (v=0x0) at test.c:48

    3 0x00000000004012a5 in function2 () at test.c:53

    4 0x00000000004012bb in function1 () at test.c:58

    5 0x000000000040131e in main (argc=1, argv=0x7fffffffd728) at test.c:68

On the other hand, compiling the test application with GLIBC, the backtrace_simple call works properly:

  1. Compile and link fully static with GLIBC:

    $ gcc -static test.c -o test -g2 -lbacktrace

  2. Run the binary:

    $ ./test Inside main Inside function1 Inside function2 Inside do_invalid_access SIGSEGV: addr=(nil) code=1

    0x40184b

    0x40e05f

    0x40192d

    0x401957

    0x401972

    0x4019da

    0x408a47

    0x4016c9

    ret=0

  3. Decode the addresses:

    $ addr2line -aipfC -e ./test 0x40184b 0x40e05f 0x40192d 0x401957 0x401972 0x4019da 0x408a47 0x4016c9 0x000000000040184b: crash_handler at /home/user/libbacktrace-test/test.c:27 0x000000000040e05f: gsignal at ??:? 0x000000000040192d: do_invalid_access at /home/user/libbacktrace-test/test.c:48 0x0000000000401957: function2 at /home/user/libbacktrace-test/test.c:53 0x0000000000401972: function1 at /home/user/libbacktrace-test/test.c:58 0x00000000004019da: main at /home/user/libbacktrace-test/test.c:68 0x0000000000408a47: __libc_start_main at /var/tmp/portage/sys-libs/glibc-2.33-r7/work/glibc-2.33/csu/../csu/libc-start.c:332 0x00000000004016c9: _start at ??:?

Questions:

  1. Is this difference in behavior due to libbacktrace or musl?
  2. Is there any way to unwind completely with musl, i.e. to have the same backtrace as with GLIBC?
  3. Is it possible to pass ucontext to backtrace_simple somehow so it would unwind directly on the received context? I'm interested in this even if it is not a portable solution.

Thank you!

ianlancetaylor commented 2 years ago

This is an issue with musl, not with libbacktrace. For this operation libbacktrace relies on the compiler support _Unwind_Backtrace function. When using GCC and (probably) LLVM that function is able to unwind through a signal handler when using glibc. It appears that it is not able to unwind through a signal handler when using musl.

The relevant code on x86_64 is https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libgcc/config/i386/linux-unwind.h;h=6170a773f5f6602bdcd97c407ac4cf225b9b705c;hb=HEAD#l47 . It looks for a specific instruction sequence to recognize the signal handler. That is the exact instruction sequence used by glibc. Perhaps musl uses a different instruction sequence. If so, perhaps it could be changed so that this works.

There is no way to pass the ucontext to backtrace_simple, sorry.

bszente commented 2 years ago

Thank you very much for the useful hint!

bszente commented 2 years ago

Just a follow-up. I managed to obtain the callstack with musl as well. It was much simpler than I expected.

musl has exactly the same signal return trampoline as GLIBC or uClibc. The issue comes from the following line from libgcc/config/i386/linux-unwind.h

#if defined __GLIBC__ && !(__GLIBC__ == 2 && __GLIBC_MINOR__ == 0)

The signal frame decoding code path is enabled only for GLIBC. uClibc-ng works because it defines the above macros, pretending to be GLIBC.

musl does not define any __MUSL__ macro, so it is not possible to enable this code path conditionally. For this reason, I personally removed the above #if line from libgcc for my use case, to force the signal frame unwinding to work for musl as well:

  1. Run the program:

    $ ./test Inside main Inside function1 Inside function2 Inside do_invalid_access SIGSEGV: addr=0 code=1

    0x4011d2

    0x40655d

    0x40127f

    0x4012a4

    0x4012ba

    0x401387

    0x404a2e

    0x401044

    ret=0

  2. Decode the addresses:

    $ x86_64-buildroot-linux-musl-addr2line -aipfC -e ./test ./test | grep ^# | cut -d ' ' -f2 0x00000000004011d2: crash_handler at /home/user/libbacktrace-test/test.c:27 0x000000000040655d: sigemptyset at /home/user/build-x86_64-2021.02.2/build/musl-1.2.2/src/signal/x86_64/restore.s:1 0x000000000040127f: do_invalid_access at /home/user/libbacktrace-test/test.c:48 0x00000000004012a4: function2 at /home/user/libbacktrace-test/test.c:53 0x00000000004012ba: function1 at /home/user/libbacktrace-test/test.c:58 0x0000000000401387: main at /home/user/libbacktrace-test/test.c:68 0x0000000000404a2e: libc_start_main_stage2 at /home/user/build-x86_64-2021.02.2/build/musl-1.2.2/src/env/__libc_start_main.c:94 0x0000000000401044: _start at ??:?

@ianlancetaylor thank you again for the link to the relevant code part.