raspberrypi / pico-sdk

BSD 3-Clause "New" or "Revised" License
3.69k stars 917 forks source link

thread_local not supported; unsafe to use c++ exceptions in multi-core setting #291

Open geurtv opened 3 years ago

geurtv commented 3 years ago

pico-sdk doesn't support thread-local storage (thread_local keyword), which is required for c++ exceptions to function properly in a multi-core environment. Internally libstdc++ stores some global exception state in a thread_local variable (static __thread abi::__cxa_eh_globals global; in eh_globals.cc).

If now both RP2040 cores throw or catch exceptions at the same time they are "mixed", because the thread_local state variable is the exact same variable on both cores. There are also no synchronization locks, so both cores can write to the same variable at the same time.

The code below shows how core0's std::current_exception() is overwritten by core1. The expected output is:

[core0] core0 exception
[core1] core1 exception
[core0] core0 exception
[core1] core1 exception

The actual output is:

[core0] core0 exception
[core1] core1 exception
[core0] core1 exception
[core1] core1 exception
#include <cstdio>
#include <stdexcept>
#include <pico/multicore.h>
#include <pico/stdio_usb.h>

void continue_with_other_core(semaphore_t& acquire, semaphore_t& release)
{
    sem_release(&release);
    sem_acquire_blocking(&acquire);
}

void print_current_exception()
{
    try {
        std::rethrow_exception(std::current_exception());
    }
    catch (const std::exception& e) {
        std::printf("[core%d] %s\n", get_core_num(), e.what());
    }
}

void run_test(semaphore_t& acquire, semaphore_t& release)
{
    try {
        sem_acquire_blocking(&acquire);
        throw std::runtime_error("core" + std::to_string(get_core_num()) + " exception");
    }
    catch (...) {
        print_current_exception();
        continue_with_other_core(acquire, release);
        print_current_exception();
        continue_with_other_core(acquire, release);
    }
}

int main()
{
    stdio_usb_init();
    std::getchar();
    std::puts("[start]");

    static semaphore_t sem0, sem1;
    sem_init(&sem0, 1, 1); // unlocked: start with core0
    sem_init(&sem1, 0, 1);

    multicore_launch_core1([] { run_test(sem1, sem0); });
    run_test(sem0, sem1);

    for (;;);
}
geurtv commented 3 years ago

Wrong analysis: static __thread abi::__cxa_eh_globals global is only used when _GLIBCXX_HAVE_TLS is defined. In our case the "normal" static __cxa_eh_globals eh_globals is actually used (by both cores), so obviously a c++ exception on one core then overwrites an active exception on the other core.

By the way, adding thread-local storage support is relatively easy: provide a custom implementation of __emutls_get_address and then use -Wl,--wrap= to use it. Then thread_local works just fine, but as stated that doesn't solve the issue.

geurtv commented 3 years ago

A solution could be to wrap the functions __cxa_get_globals and __cxa_get_globals_fast. One potential problem: the pointer they return is to a libstdc++-internal structure (__cxa_eh_globals), so different GCC versions can have different definitions for it.

I have a proof of principle for gcc-arm-none-eabi-10-2020-q4-major at:

https://github.com/geurtv/pico-sdk/tree/develop

file: https://github.com/geurtv/pico-sdk/blob/develop/src/rp2_common/pico_standard_link/wrap_eh_globals.cpp

commit: https://github.com/geurtv/pico-sdk/commit/f869eb618162f605878a4e53604fb8a493721ba1

kilograham commented 3 years ago

yeah, adding __thread support makes sense (but then reentrant support for lib functions makes sense which probably suggests we should build newlib ourselves with different options compared to the one shipped with gcc-arm-eabi-none)

is __cxa_get_globals in newlib or gcc libs? in any case, yes it would make sense for C++ exceptions to work on both cores (you are the first to notice that they don't... although in fairness we have them turned off by default).

geurtv commented 3 years ago

It's all gcc as far as I can tell. __cxa_get_globals and __cxa_get_globals_fast are both in libsup++, __emutls_get_address (for thread-local storage) is in libgcc.

geurtv commented 3 years ago

FYI: I've also just added a wrapper for __emutls_get_address. Not sure how complete the implementation is, but thread-local storage seems to work now.

https://github.com/geurtv/pico-sdk/blob/develop/src/rp2_common/pico_standard_link/wrap_emutls.c

alastairpatrick commented 2 years ago

Would be good if this integrated well with an RTOS so that similar problems don't happen when there are multiple tasks running on one core. For example, FreeRTOS has vTaskSetThreadLocalStoragePointer() and pvTaskGetThreadLocalStoragePointer().