ivanarh / ndcrash

A powerful crash reporting library for Android NDK. Don't forget to run git submodule update --init --recursive after checking out.
Apache License 2.0
111 stars 21 forks source link
android android-ndk crash-dump crash-reporting crash-reporting-tool crash-reports crashes ndk

NDCrash

NDCrash is a powerful crash reporting library for Android NDK applications. The author was inspired by PLCrashReporter and Google Breakpad. Note that this library is new and has a an experimental status.

Key features

Roadmap

These features are not currently implemented but they are in plans. They are likely to be implemented only for out-of-process mode due to in-process mode restrictions.

Crash handling modes

NDCrash supports 2 crash handling modes. Mode is a way how and where a crash report data is collected. The same concept is used by Google Breakpad, see its documentation Both of these modes are supported in the library at once but only one of them needs to be activated at run-time (useful for A/B testing). Also any of them can be disabled by compilation flags for optimization purposes.

In-process mode

In-process means crash report is created within crashing process, it happens in a signal handler. In most cases this approach works but there are 2 major problems when you use this mode: signal safety and stack restrictions. It means you should prefer out-of-process mode if it's possible.

Signal safety

In general signal handler requires its code to be signal safe. It's safe to run only a very limited set of functions, see this man page. More details could be read in glibc documentation. A lot of functions habitual for each and every developer are not signal safe, for example heap memory allocation by malloc/free. The worst case that may happen if your handler isn't signal safe is a deadlock during signal handler execution, in such event a crash report won't be created and user will have to destroy your application explicitly. But it doesn't mean that these stuff couldn't be used in signal handler for crash reporting, because an application process will be terminated anyway after signal handler execution and all we need is to create a report. Therefore, a good idea is to minimize unsafe stuff usage in order to make a crash reporter work properly for most part of crashes.

Stack restrictions

Starting from Android 4.4 bionic uses an alternative stack for signal handlers. See bionic source and sigaltstack documentation. This stack has fixed size, by default SIGSTKSZ constant value is used (8 kilobytes on 32-bit ARM Platform). It's very useful feature when a crash due to stack overflow happens. However, this stack size could be insufficient because heap allocations are not safe and you are forced to allocate a memory on a stack. For example, libunwind's unw_cursor_t has a huge size (4 kilobytes) and it's a very big trade-off where to allocate a memory for it. Of course, some static buffer may be used but it's not thread safe, signal handlers may execute concurrently for different threads. Libunwind provides a special "memory pool" mechanism for this case. A workaround for this problem is possible: you can allocate a stack of any size and set it by sigaltstack function. But it should be done for every thread of your application, so some wrapper around pthread is required.

Out-of-process mode

Out-of-process means crash report is generated in a proces other than crashing. This is possible due to ptrace system call that allows some process to inspect a state of another process. Originally out-of-process mode is used by Android system debugger (debuggerd).

When NDCrash works in out-of-process mode it has 2 different parts:

Out-of-process mode flow

Details how out-of-process mode works are described below:

A restoration of previous signal handler is necessary to preserve operating status of standard Android debugger (debuggerd), the bionic library registers this handler in order to initiate crash report generation by debuggerd. This is because we can't obtain registers state for stack unwinding by ptrace (we send it by a socket). To do this we would install a default signal handler (SIG_DFL) and re-raise a signal. This is exactly how google breakpad behaves and broken debuggerd is one of big disadvantages of this crash reporting library.

Stack unwinding

The most interesting information in a report is, of course, a backtrace, also known as stack trace. At the same time, obtaining of this data is the most difficult task during crash report generation. To obtain a backtrace we need to analyze a stack data by walking through all stack frames. This process is called stack unwinding. There are several ways how to perform stack unwinding, also different third party code may be used for this purpose. This led to support of different stack unwinders in NDCrash library. Each unwinder is a module within the library that provides a code that unwinds a stack and writes a backtrace to a crash report. All unwinders are supported by library at one but only one of them should be selected in the moment of library initialization. This may be useful for A/B testing.

Ways to unwind a stack

Stack may be unwound using different algorithms. The main challenge for them in crash reporting is to analyze stack data, determine bounds of every stack frame and extract all return addresses from it. Stack data may be easily accessed by reading memory at an address from "stack pointer" register.

On some architectures that not currently supported by NDK (such as MIPS) proper stack unwinding is possible without information in additional sections. These architectures are not supported by NDCrash and it doesn't make sense to mention them.

Details about each unwinder supported in NDCrash are described below.

"cxxabi" unwinder

It uses standard C++ library facilities to unwind a stack (the same functionality is used during C++ exception handling). Specifically, it uses _Unwind_Backtrace and _Unwind_GetIP functions to unwind a stack plus POSIX dladdr function to obtain an information about module and function.

Supported processor architectures: All.

Ways to unwind a stack: ARM Exception Tables on 32-bit ARM, DWARF .eh_frame on another architectures.

Supported modes: In-process only.

Advantages: Simple, doesn't require any 3rd party code.

Disadvantages: It's impossible to set an initial processor state to unwind a stack. On Android <= 5.0 on ARM architecture it can't unwind a stack properly due to bug in bionic, see this commit. On newer versions of Android it will lead to surplus lines in backtrace that are not related to crash itself.

"libcorkscrew" unwinder

It uses obsolete libcorkscrew library from Android sources that was used by debuggerd on Android 4.1 - 4.4 versions. To make it work on any Android version it's linked statically. A special fork with patches to build with NDK toolchain is used.

Language: C89

Supported processor architectures: ARM, x86

Ways to unwind a stack: ARM Exception Tables on ARM, DWARF .eh_frame on x86.

Supported modes: In-process & Out-of-process.

Advantages: A very tiny size.

Disadvantages: Obsolete. Lack of 64-bit architectures support.

"libunwind" unwinder

Uses android fork of libunwind library that was used as a replacement for libcorkscrew since Android 5.0. Like for libcorkscrew, some patches has been applied to make build with standard NDK toolchain possible, fork with patches is here.

Supported processor architectures: All supported by NDK plus some extra.

Language: C89

Ways to unwind a stack: DWARF on all architectures, ARM Exception Tables on ARM.

Supported modes: In-process & Out-of-process

Advantages: Powerful, a lot of supported architectures, stable.

Disadvantages: It's going to be obsolete in newer Android versions because new libunwindstack library is actively developed.

"libunwind" unwinder

This library is being actively developed especially for Android, you can obtain a source code here. This is going to be a replacement for libunwnd in modern Android versions.

Supported processor architectures: All supported by NDK.

Language: C++11

Ways to unwind a stack: DWARF on all architectures, ARM Exception Tables on ARM.

Supported modes: In-process & Out-of-process

Advantages: Powerful, modern, actively developed.

Disadvantages: Unstable. Requires massive C++11 standard library, so it's not a good solution for plain C projects.

"stackscan" unwinder

The most simple unwinder. Not recommended to use in general cases.

Supported processor architectures: All supported by NDK.

Ways to unwind a stack: Full stack scanning.

Supported modes: In-process only.

Advantages: Doesn't require additional sections such as .ARM.extab or .eh_frame, so they can be stripped.

Disadvantages: Inaccurate results, not optimal.

Integration

For easier integration you can use java wrapper

Current build system uses CMake and builds NDCrash as a static library. It's a recommended way to use. A good idea is to add its source code as a submodule. Assuming NDCrash submodule is checked out to libndcrash subdirectory in the same directory where CMakeLists file is located, you need to add a subdirectory with NDCrash library, add its "include" directory to include paths and use this static library. Please add these lines to your CMakeLists.txt:

add_subdirectory(libndcrash)
include_directories(${CMAKE_SOURCE_DIR}/libndcrash/include)
...
find_library(LOG_LIB log)
target_link_libraries(myproject ndcrash ${LOG_LIB})

Please note that log library linkage is required.

Customization

For optimization purposes you can turn on or turn off unused modules or library usage mode (in-process or out-of-process). The following CMake variables are used, all of these are boolean-typed. Absense of variable is treated as false.

Note that it's possible to build a library with all flags set to "false", in this case it would return error on initialization.

A good place to set this variables is build.gradle file of a module where NDK library is built:

        externalNativeBuild {
            cmake {
                arguments "-DANDROID_STL=c++_static",
                        "-DCMAKE_VERBOSE_MAKEFILE:BOOL=ON",
                        "-DENABLE_LIBCORKSCREW:BOOL=ON",
                        "-DENABLE_LIBUNWIND:BOOL=ON",
                        "-DENABLE_LIBUNWINDSTACK:BOOL=ON",
                        "-DENABLE_CXXABI:BOOL=ON",
                        "-DENABLE_STACKSCAN:BOOL=ON",
                        "-DENABLE_INPROCESS:BOOL=ON",
                        "-DENABLE_OUTOFPROCESS:BOOL=ON"
                        "-DEENABLE_OUTOFPROCESS_ALL_THREADS:BOOL=ON"
            }
        }
        ndk {
            abiFilters "x86", "armeabi-v7a", "x86_64", "arm64-v8a"
        }

Also in this place you can customize processor architectures for what a library should be built. NDCrash supports 32-bit and 64-bit ARM and x86 architectures.

Usage

All public API of this library is located in a single header file, to use it please add to your sources: include <ndcrash.h> To start using library you need to initialize it. NDCrash requires 2 parameters to be set in both modes:

It's an application developer responsibility to process this file in some way: send it to a server, offer a user to write a letter to developers, etc. Typically it may be done on a next application launch after a crash (a report existence may be an indicator whether a last launch has finished by crash). But out-of-process mode supports a special callback that could be executed straight after crash report generation, it's done in background service process and allows to send a report to a server immediately (but don't forget that in this case user interaction abilities are very limited).

Please note that NDCrash may be initialized only once, a repeated initialization will return an error. An initialization is done for a whole process and this operation is not reentrant. A good place for it is before any initialization code of application, for example Application.onCreate Java method or JNI_OnLoad function, before any background thread is created.

For examples you can take a look at java wrapper source code.

In-process

For this mode you should only provide an unwinder enum value and full path to a crash reprot. For example, initialization with "libunwind" unwinder:

    const char *crashReportPath = ...
    const enum ndcrash_error error = ndcrash_in_init(ndcrash_unwinder_libunwind, crashReportPath);
    if (error == ndcrash_ok) {
        // Initialization is successful.
    } else {
        // Initialization failed, check error value.
    }

It will set up a signal handler that generates a crash report. In case when an old signal handler should be restored please use ndcrash_in_deinit() function, it doesn't have any arguments.

Out-of-process

An initialization in this mode is quite more difficult: we should initialize 2 components that are run in different processes:

An UNIX domain socket is used by crashing process to communicate with daemon. To identify this socket NDCrash requires to specify one parameter yet: a special string called socket name. It should be unique for all applications that are installed on a device, so it's a good idea is to include an application package name to a socket name.

Main process signal handler initialization

As a signal handler in out-of-process mode doesn't generate a crash report, an unwinder and crash report path are not used for initialization. All you need is to specify a socket name:

    const char *socket_name = ...
    const enum ndcrash_error error = ndcrash_out_init(socket_name);
    if (error == ndcrash_ok) {
        // Initialization is successful.
    } else {
        // Initialization failed, check error value.
    }

Background process crash reporting daemon initialization

For this mode you should only provide a crash report path, an unwinder enum value and full path to a crash reprot. For example, initialization with "libunwind" unwinder:

    const char *socket_name = ...
    const char *crashReportPath = ...
    const enum ndcrash_error error = ndcrash_in_init(
            socket_name,
            ndcrash_unwinder_libunwind,
            crashReportPath,
            NULL,
            NULL,
            NULL,
            NULL);
    if (error == ndcrash_ok) {
        // Initialization is successful.
    } else {
        // Initialization failed, check error value.
    }

The last 4 parameters of ndcrash_in_init that have NULL values are optional, this is daemon lifecycle callbacks that may be used to run a special code when crash happens. For example, we can send a report to server in a crash callback, it's called immediately after crash happens. For details please take a look at jndcrash.c source file. All callbacks are run in background thread of crash reporting service process. The following callbacks can be set:

Also 4th argument may be set: it's an auxiliary argument that is saved inside a library and passed to all callbacks. This argument can be obtained at any time by ndcrash_out_get_daemon_callbacks_arg() function.