libhal / libhal-exceptions

Exception runtime library for Cortex M series CPUs
Apache License 2.0
5 stars 1 forks source link

Add multithread support but without using TLS #18

Open kammce opened 7 months ago

kammce commented 7 months ago

A request from C++ committee members was to provide a exception runtime that can work across multiple threads, but does NOT require thread local storage.

Motivation

  1. There exists systems for which thread local storage of any kind is expensive. Potentially GPUs, and other non-CPU devices that execute C++ code.
  2. Providing a non-TLS support for exception handling would strongly improve the likelihood that committee members would accept sticking with the current backwards ABI implementation.

NOTE: Although I am not fully convinced that (1) is a massive issue, but I benefit of proving its possible with the current ABI will give little room for the opposition to suggest alternative ABI breaking or additional implementation solutions.

kammce commented 7 months ago

Implementation solution (0)

So I believe I've determined a solution to this problem. The root of the problem with exceptions is remembering where the exception pointer is when you drop into a to perform cleanup (execute destructors) or into a catch block landing pad.

The problem

The libc++abi has APIs that take no inputs. The biggest offenders that we expect to see in non-degenerate code is:

With TLS, these variables can be easily recovered in the function definition. All that is required is a thread local pointer to the exception which is looked up based on the thread the exception was thrown in.

We cannot allocate the memory for the exception downstream in the stack. We should assume that all memory downstream from a destructor or catch block being called is off limits. This is due to the fact that destructors could allocate a large amount of memory on the stack. We should assume that destructors allocate infinite memory and thus would corrupt our exception pointer.

The solution

image

Use the return address on the stack as storage for the address of the std::exception_ptr. Before the runtime jumps into the function, it would cache the return address from the function's stack into its exception header, and replace the return address with the address of the address of the exception.

There is no other part of the stack that is guaranteed to exist or wouldn't cause issues if it was modified. Lets go over the alternatives:

  1. Why not put the memory below the local stack, akin to push R0 assuming R0 has the exception pointer? Because destructors could corrupt it.
  2. Why not use the stack above the current frame? Because the local frame may refer to memory in the frame above. So corrupting that data could break destructors.
  3. Why not use a different register besides the return address? The only register that we know MUST exist is the return address. Functions are required to store this if they call other functions, otherwise they would not be able to return from their current function.
  4. Is there anywhere else we could put it, like maybe a list that contains an address/index for every possible thread in the application? This is TLS :P just that we implemented using our own array.

We can detect if the return address is an exception pointer by checking if the return address exists outside of one or more .text sections. For rethrow, the __cxa_rethrow function can perform a search up the stack by consulting the unwind table based on the current stack pointer. Once the exception pointer is found, the exception pointer is copied into the current frame, the previous return address is copied from the header cache, and unwinding can continue as normal. Same process for cleanup except the unwind only requires 1 level of unwind. This gets complicated when you use throw e but it could be managed by detecting the exception pointer in the return address during unwinding, and when the table tells the system to jump back into the function to perform __cxa_end_catch and cleanup, the old exception can be used. The exact details of how this handoff works will be figure out later, but I'm certain it is implementable.