Implement backtrace printing for non-glibc targets

hrishikeshSuresh commented 5 years ago

Problem Summary Implement backtrace printing for non-glibc targets (bionic and musl)
Solution In ./src/modules/roc_core/target_bionic/roc_core/backtrace.cpp, I have implemented captureBacktrace(), to find the size of backtrace stack. In captureBacktrace(), we call _Unwind_Backtrace() to perform stack unwinding through unwind data, using the unwindCallback() function. dumpBacktrace() will first check whether backtrace is available or not and then print the following in the format of
#index: address_of_the_cursor function_name.
Here, demangled name is printed if the demangling operation suceeds. dumpBacktrace_fd() is same as dumpBacktrace() but prints the output to the file parameter passed.
In ./src/modules/roc_core/target_musl/roc_core/backtrace.cpp', backtrace() finds the size of backtrace stack by just iterating till the end of the stack. backtrace_symbols() does the same thing but we print the following in the format of
#index : function_name address_of_cursor offset instruction_pointer
to stderr. backtrace_symbols_fd() does the same but writes the output to the given file descriptor.

Queries 1 . I am printing the backtrace in two different ways for each backtrace.cpp files. Which one should I stick to? 2 . I couldn't add the optional dependency on libunwind.h, so can any hints/ideas be given as to how this has to be done?

gavv commented 5 years ago

Thanks! I'll take a look.

One thing I can tell right now:

print_backtrace_emergency() is called from a signal handler (e.g. when we receive SIGSEGV). This means that this function should be signal-safe, as mentioned here:

https://github.com/roc-project/roc/blob/faebd1e0059c16476f6aae8ee5a1cfc66c8fee7f/src/modules/roc_core/backtrace.h#L24-L28

You can read about signal-safety here: http://man7.org/linux/man-pages/man7/signal-safety.7.html

In particular, signal-safe function is allowed to call only other signal-safe functions. malloc and fprintf, and all other functions that work with FILE* or perform allocations, are not signal-safe, so we can't use them in print_backtrace_emergency().

Thus, print_backtrace_emergency() is intended to be a simplified version of print_backtrace() that can have limited functionality but can be called from a signal handler. For example, our implementation of print_backtrace_emergency() for glibc does not perform demangling and formatting at all, it just calls backtrace_symbols_fd() which is signal-safe.

libunwind documentation: (https://www.nongnu.org/libunwind/man/libunwind(3).html#section_5) says:

The manual page for each libunwind routine identifies whether or not it is signal-safe, but as a general rule, any routine that may be needed for local unwinding is signal-safe (e.g., unw_step() for local unwinding is signal-safe). For remote-unwinding, none of the libunwind routines are guaranteed to be signal-safe.

So for libunwind, we should:

check manual page for every libunwind function and ensure that we only perform local unwinding and only call its signal-safe functions
in our print_backtrace_emergency() implementation we also should not call any functions from standard library that are not signal-safe, e.g. fprintf() and malloc()

On bionic, we also should find a signal-safe way to print backtrace, or we can leave print_backtrace_emergency() unimplemented for now.

gavv commented 5 years ago

I couldn't add the optional dependency on libunwind.h, so can any hints/ideas be given as to how this has to be done?

This should be done in SConstruct file. Basically you can just duplicate what we're doing for sox and adjust it for libunwind, specifically:

add --disable-libunwind (like --disable-sox)
automatically append target_libunwind to ROC_TARGETS if we use musl and if --disable-libunwind is not specified (like we do it for target_sox)
check whether libunwind is available on system (like we do it in if 'target_sox' in system_dependecies: branch for sox)
download and build libunwind if it's present in --build-3rdparty option (like we do it in if 'target_sox' in download_dependencies: branch for sox)
add libunwind branch to our 3rdparty.py script

If you will have any trouble, feel free to ask here or in the IRC chat, I'll help you with this.

gavv commented 5 years ago

If you will not find a good way to print backtrace in a signal-safe manner on Bionic, there is another option. If libunwind works good on Anrdoid (could you check it please?) we can use it on Bionic too.

In this case we can have three backtrace implementations:

target_glibc - selected if we're using glibc; prints backtrace using glibc functions
target_libunwind - selected if we're using musl or bionic and libunwind was not disabled by user; prints backtrace using libunwind functions
target_nobacktrace - selected if we're using musl or bionic and libunwind was disabled by user; no-op implementation

gavv commented 5 years ago

/link #242

gavv commented 5 years ago

I am printing the backtrace in two different ways for each backtrace.cpp files. Which one should I stick to?

So the answer to this question is that we need both versions: print_backtrace() is full-featured version that can perform allocations, demangling, use fprintf, etc; and print_backtrace_emergency() is a limited version that can be called from a signal handler.

Since print_backtrace_emergency() can't use fprintf(), it should write to the stderr file descriptor directly, but it doesn't make sense to use fdopen() because it's not signal-safe.

hrishikeshSuresh commented 5 years ago

I made print_backtrace_emergency() signal-safe by making use of write() instead of fprintf() in target_musl/roc_core/backtrace.cpp . Demangling operation for print_backtrace() was missed in the earlier commit for musl, so that has been added now.

gavv commented 5 years ago

Thanks, I've reviewed the libunwind version. Except the comments above, it's looking good.

What are your plans for the bionic version?

BTW, if you want, we could split the PR into two parts: one PR for libunwind and another PR for bionic.

hrishikeshSuresh commented 5 years ago

I was checking if libunwind can be used for android and in this file, it says the android libunwind api is compatible with the non-gnu libunwind api. So can we use libunwind with bionic? And go ahead with three implementations - target_glibc, target_libunwind, target_nobacktrace like how you had mentioned earlier.

gavv commented 5 years ago

Interesting. I've googled about this version of libunwind a bit.

I've found this repo: https://github.com/alexeikh/android-ndk-backtrace-test

Here is a summary.

We can use libunwind from Android NDK to print backtrace from a signal handler. But the code will be architecture-specific.
We can use _Unwind_Backtrace() from Bionic to print backtrace from a signal handler as well. The code will be architecture-specific too.
We can use both libunwind and _Unwind_Backtrace() in an architecture-independent manner, but in this case it will not be possible to use it from a signal handler (we will not see the pre-signal stack).

Adding architecture-specific code and maintaining different versions for different CPUs for such a minor feature would be an overkill. So we're throwing away the first two options.

Then, Android NDK ships with pre-compiled libunwind.a (not sure if it's present on all architectures through), but it doesn't provide libunwind headers. So we will need to ship them manually if we want to use libunwind. The problem with this approach is that these headers may be specific to NDK version, so I'd prefer to avoid this.

Thus, it seems that the best option is to implement print_backtrace() using _Unwind_Backtrace() and to leave print_backtrace_emergency() empty.

But then I've found this page: https://source.android.com/devices/tech/debug

It says that if an Android application crashes, you can find its backtrace in the tombstone file. We should check whether the backtrace will be present there in case of roc_panic(). If it will, we can avoid implementing backtrace on Android at all and just rely on this feature. Unfortunately, I can't test it right now because my Android phone is broken :)

So if you can test it, please do it, and we will see if we need backtrace on Android. If we need it, we should use _Unwind_Backtrace(). If we don't need it, we can leave the bionic version empty.

If you can't test it, you can leave the bionic version as is (empty) until someone will have time for it.

gavv commented 5 years ago

On the other hand, after some thought, it seems that it would be handy to see the backtrace of a panic in stderr (even if it's also present in tombstone), because the rest panic message is printed to stderr too, and the backtrace is actually a part of this message. And since we already have the code to print it (in this PR), I think we can keep it.

gavv commented 5 years ago

So to sum up, I think we need to do the following:

remove dumpBacktrace_fd() and leave print_backtrace_emergency() for bionic empty
rename target_musl to target_libunwind (because this code is not specific to musl, but is specific to libunwind)
add target_nobacktrace with a no-op backtrace implementation
add target_libunwind and target_nobacktrace support to SConstruct as described above

In result, we will have four backtrace implementations:

target_glibc - used when libc is glibc
target_bionic - used when libc is bionic
target_libunwind - used when libc is not glibc or bionic and libunwind is not disabled
target_nobacktrace - used when libc is not glibc or bionic and libunwind is disabled

gavv commented 5 years ago

I've added libunwind to our alpine environment for CI: https://github.com/roc-project/dockerfiles/commit/4394200fb3a9f83b98a8d522d87fe42b7ac45104

(alpine image is the only one that uses musl and so the only one where we should enable libunwind by default)

hrishikeshSuresh commented 5 years ago

In this commit, I haven't replaced write() with print_emergency_message(), which I will do it later, but I think everything else is done for libunwind. Also, I will start working on SConstruct file.

hrishikeshSuresh commented 5 years ago

About checking the backtrace for Android, I can't test right now. Maybe I can do it later, and in case, we need to implement some code, we'll a separate PR for that. For now, print_backtrace_emergency() is unimplemented for bionic.

gavv commented 5 years ago

Great. One more round :)

gavv commented 5 years ago

OK, so the remaining issues are:

the two minor comments above
use print_emergency_message() instead of write() (and don't include unistd.h)
update SConstruct

hrishikeshSuresh commented 5 years ago

In SConstruct file, I am not sure what exactly to be done in part we are have to build libunwind (if 'target_libunwind' in download_dependencies) part, because in target_sox we are checking for a lot of dependent libraries and for libunwind, there are no such dependencies.

gavv commented 5 years ago

Please re-target pull request to the develop branch (see https://roc-project.github.io/roc/docs/development/version_control.html#pull-requests)

roc-streaming / roc-toolkit

Implement backtrace printing for non-glibc targets #249