libhal / libhal-exceptions

Exception runtime library for Cortex M series CPUs
Apache License 2.0
5 stars 1 forks source link

Why oh why ? #53

Open X-Ryl669 opened 3 days ago

X-Ryl669 commented 3 days ago

Hi,

Just watched your talk at Cppcon about embedded firmware for exceptions in C++. Honestly, it's good and I think it can be better if we just re-think about how exception are done.

I think exception handling in C++ is like iostream. There were shoehorned without too much of a thought about them and now we're struck at what they are, completely inadapted for actual code.

IMHO,

Exception is just a fancy word for multiple return path.

Implementation alone, what exception does is simply to pack a new return path in your calling function. If you look at the code that's doing the unwinding and stack processing, it's exactly what a return path is doing. The unwind code is just ROP (return oriented programming).

As you said at the end of your presentation, it was thought dumbly since the caller must know what the callee'll do (and we don't/can't know) to insert all the return paths (all the catch(type) stuff), instead of doing the opposite (that is, pushing the handler code downward the stack so the callee will be able to call them or die/terminate if it can't find a compatible handler).

Exceptions shouldn't need RTTI

It's a complete non sense to require RTTI for exception behavior to work. First, it's possible to give up RTTI completely, since you only need an index to identify what is the error. In current C++, it's possible to compute a single UID for any type at compile time, RTTI disabled. So there's absolutely no reason to keep the RTTI non sense. In C++26, it'll even be standardized in std::meta::info so the reflection code can write the handling code as expected. The idea here is to replace the non sense catch (some_type i) by catch (uid e). The only requirement for this to work is to have a single "exception space" where you'll store the exception value. The exception value can be a compile time unknown variant whose size is computed at link time. Every time a throw is found in the code, the thrown type is added to a new section in the object code. The linker will then collect all the types in this section and can compute the minimum exception handling type size. => No allocation required at runtime at all

Exceptions shouldn't need unwinding

Why walk up the stack when an error occurs? That's a lot of unnecessary code (even if, as you said, this code is compressed), that could be completely avoided. Typically, exception handling is just a fancy way to tell the code how to clean its own mess. By looking at the exception handling as it's done now, we have the dumbness of duplicating the path for calling destructors (in any throwing function, you'll need to destruct the local objects on both the normal code flow or the exceptional code flow). This means double binary size (at least, it'll be a lot more in reality).

We also have the dumbness of creating types that have no real meaning or function except for storing the error type and values - so they end up polluting the typeinfo database, the heap on construction, and will be dismissed as soon as constructed.

A more clever approach would have be to push the cleaning code downward or better, share the cleaning code (which is often the same) in a common path.

Typically, in a C++ embedded code (or generally in any mainframe code), you could have code like this:


void myFunc(whatever x)
{
   try {
      foo();
      foo2(); // Both foo and foo2 can throw type1 for example, so it make sense instead of error handling
   }
   catch(type1 t) {
       // Code 1
   } catch(type2 t) {
      // Code 2
   }
       // Code 3
}

// Leading to pseudo assembly as 
myFunc:
     call foo
     call foo2
     jmp Code3
     exception_prolog
     // Code 1
     jmp Code3
     exception_prolog
     // Code 2
     jmp Code3
Code3:

be turned in something like this:


void myFunc(whatever x)
{   
     auto HowToFix = prepareHowToFix(); // prepareHowToFix is auto generated by the compiler
     foo(HowToFix);
     foo2(HowToFix);
}

// Example implementation for prepareHowToFix:
auto prepareHowToFix(myFunc) {
    return [ [](type1_tag, ...) { 
                  // Code 1
                }, [](type2_tag, ...){
                  // Code 2
                }, [](...) {
                  // Terminate
                }]; 
}

// Note: type1_tag is computed at compile time as hash(ObjectName(type1)) or simply a counter for a unique identifier that's set up at link time

// Which would give the following assembly
Code1:
    // Code1
Code 2:
   // Code 2 here
HowToFix: 
   // Array containing pointers to Code1, Code2, Terminate

myFunc:
   push HowToFix
   call foo
   call foo2
Code3:

In foo, instead of throwing, like this:

void foo()
{
    // Some code
    throw type1("Bad context");

    // Some latter code
} 

it would do this

void foo(auto exceptionHandling) // HowToFix from above, can be opaque
{
    // Some code
    return exceptionHandling.call(type1_tag, "Bad context");
    // Some latter code
}

The call method here is a dumb as it should be, if it finds a tag in the array of correcting functions, it calls it. If it doesn't find it, it errors out (that check can be done at compile time), so the responsibility of a new exception type in the updated library foo() will be in library's author, not in the calling code. Also, if foo is calling bar with some existing exception handling if can pass the exceptionHandling object downward and be done with it (no additional code). Or, if it need to specialize the handler for the type1 tag, it can either replace the function pointer in the array or append to the array (provided the compiler is smart enough to look backward for a matching tag).

Technically, what I'm proposing here is like the hidden this pointer of method, that is using an additional register to pass the exception handling code if any. That's one way to do it, but it can also be passed in the thread local storage (so not to cost anything). Every function that's dealing with a catch block will instead create arrays of pseudo catch handler functions. and modify the unique TLS pointer to those. Every function that's throwing will consult the current array in TLS and act accordingly.

kammce commented 3 days ago

Hey! Thanks for the message. I see you put a lot of time and effort here. I read through the whole thing and there is a lot of break down here. But currently, I don't think I have the time to response. I'm not sure when I'll have the time, but I'll respond when I can. That could be in months.

I will say this. My intention is to take the old approach and make it far better versus suggesting a 3rd error handling option or a new ABI.

kammce commented 3 days ago

Oh! I forgot to mention this! If you are interested in alternative approaches to exceptions handling, here are a few:

I don't know of many people who are trying to fix up the current system, but I know there are a lot of people interested in more radically different approachs to exception handling in C++. Some that require ABI breaks or additional language features. Maybe consider seeing if these approaches align with your ideas.

X-Ryl669 commented 2 days ago

Thanks for the answers. If you only intend to improve on the current implementation, I wonder if you could completely drop the RTTI requirement (by implementing your own personality, you can probably remove the typeinfo for the exception type and instead replace that to something smaller, like a byte). It's very unusual to have more than 256 different catch handlers in the whole program, and thus, the byte could be an index in a static array of types. This would save a lot of space for every throw out there with only a small downside (maybe use code 255 for an extended size, real typeinfo).

kammce commented 2 days ago

Since this is shorter, I'll give this response.

Thanks for the answers. If you only intend to improve on the current implementation, I wonder if you could completely drop the RTTI requirement

I don't see any value in removing the RTTI information. The size of the RTTI info is quite small and has the same cost as a type's vtable in that no matter how many objects you make of type T, you only have you only pay for that one vtable in the whole program.

(by implementing your own personality, you can probably remove the typeinfo for the exception type and instead replace that to something smaller, like a byte).

The RTTI info is fed to __cxa_throw prior to any such personality info being involved. So I'm not sure what the benefit would be there. We would be choosing between passing an address vs a byte which shouldn't make much of a difference on most platforms.

It's very unusual to have more than 256 different catch handlers in the whole program,

I don't think its a good idea to put such restrictions on developers and the code they write. A limit of 256 catches seems extremely limited. I could imagine software at Google, if it had used exceptions, having far more than 256 different catch handlers. But again, our philosophy on exception handling is probably far different than the grand majority of the C++ community. We have a different philosophy due to our implementation experience.

and thus, the byte could be an index in a static array of types.

This isn't a bad idea on its own. If we collected all of the RTTI address into a static array then the action table numbers could just be indexes into that array, eliminating the need for the type table in the LSDA. This could be another option that our exception tools provide. We are developing a number of personality data structures for various use cases. This was on our mind before. 😄

This would save a lot of space for every throw out there with only a small downside (maybe use code 255 for an extended size, real typeinfo).

I'm not sure I understand how this saves space for a throw operation? The __cxa_throw takes a pointer to the RTTI info. That info's address is known at link time and the throw just puts it into a register. Typically only needs a single mov like instruction.

X-Ryl669 commented 1 day ago

I don't see any value in removing the RTTI information. The size of the RTTI info is quite small and has the same cost as a type's vtable in that no matter how many objects you make of type T, you only have you only pay for that one vtable in the whole program.

It's not the RTTI for the exception that are the issue (usually, you don't have many exception types). It's the RTTI for everything else (all the classes and all the template specializations) that's the issue. You can't tell the compiler "hey, only build the RTTI for the exceptions but ignore everything else, since I won't dynamic_cast or typeinfo those". So it's a all or nothing choice, and this does make a difference (on my eMQTT5 library, where there's a lot of template used for reducing the code size, the RTTI takes 25% of the binary size if enabled).

I don't think its a good idea to put such restrictions on developers and the code they write

In general, 100% agree. But if one would enable exception in embedded firmware, she would accept limitations (after all, we all accepted no rtti, no exception, no LTO, function-section and other dumbness). If that could divide the exception cost by 1.2 or 1.5, it might be enough to enable the feature. I don't think it's possible through to reach that goal.

The __cxa_throw takes a pointer to the RTTI info.

I'm not familiar with how GCC is doing implementing the throw underneath. I would have expected that it would be customizable somehow. If it's "hey, here's the typeinfo and the void * of the exception that was thrown", then obviously, there's no way to improve this (or maybe the linker can do better?)

I was thinking the actual throw implementation was like:

throw X(args);

=> generates:
void * ptr = malloc(sizeof(X));
new(ptr) X(args);

throwfunc(ptr,  &X::~X, typeinfo(X), unwindFunc); // First 3 args are for type erasure and are of type void *, void(*)(void*), void* or typeinfo_t*?

If it was template based, we could do:

throw X(args);

=> generates (pseudo code):
template<typename Thrown, UnwindFunction func, Args...>
[[noreturn]] void thrown(Thrown, Args... args, func)
{
    uint8 id = hash<std::remove_cv_t<Thrown>>(); // Template used to compute unique hash of "Thrown"
    ExceptionTable[id].constructInto(args); // Avoid dynamic allocation if possible or rely on static storage here to build the exception type in place in the look up array
    CurrentException = id;
    // Unwind here
    func();
}

If I understand you correctly, it's not possible to replace the throwfunc with anything else because it's done low level in the compiler. I wonder if it's possible to override __cxa_throw function with one that doesn't use the typeinfo parameter. In that case, maybe the linker can see that and simply garbage collect all the typeinfo object that were generated. We would loose the ability for dynamic cast to the most appropriate exception type, but for 99% of the time, we don't care at all about this and in an embedded firmware, I think it's a acceptable trade off that could allow to turn on -fno-rtti

kammce commented 1 day ago

Which talk did you watch of mine? I explain this in my talk. Here is the latest version from CppCon 2024 with the timestamp set to the part of my talk about RTTI: https://youtu.be/bY2FlayomlE?t=1197&si=L-mUpRde0issCAld

I explain how using -fno-rtti in GCC eliminates all RTTI info for all types except for those that are thrown.

So I'm not sure what your point is here 🤔 I also give a quick verbal shout out to the sizes of the RTTI info. If you don't believe me, see the Itanium ABI specs and they are there.

EDIT: Typo from Uranium ABI --> Itanium ABI