'throws' ought to be explicit IR instruction

dkfellows commented 1 year ago

This is a touch of spitballing, but...

The actual throwing of an exception ought to be an explicit (and terminal) instruction in the IR, instead of delegating this to a function call. The problem with a function call is that optimizers are not able to know that the function call definitely throws the argument in cases (and does nothing other than that) and so can't do flow analysis with it. Not unless you do something truly nasty like hard-coding the name of the throws implementation function; that only ought to be important when doing the conversion from IR into actual machine code. Having it as a basic conceptual instruction in the IR would be much tidier.

When would this become an issue? Well, the most obvious case would be when there is aggressive inlining going on that moves the instruction site that (definitely) throws into the same function as the instruction site that catches; in that case, it should be possible to turn a throw-and-catch into a suitable jump between basic blocks; there definitely wouldn't be any stack unwinding required. Right now, that would require detecting the special function, and that doesn't seem to be anything other than a horrible way to do it.

asl commented 1 year ago

This seems to be something that should be discussed on LLVM Discourse

dkfellows commented 1 year ago

Quite possibly, but I've just not got the personal bandwidth. I'm just an occasional user, and when I was using it I didn't actually use anything relating to exceptions as we generated our error paths in a manner more like how Rust's sum types (though I guess it was thinking about how that works out that made me wonder whether C++ can do the same at the optimizer level, and it looks the answer is "no" surprisingly, which I believe to be fundamentally due to the lack of this flow graph concept).

My point is that this would enable someone to optimise exception paths in some cases and it definitely seems to be logically a missing thing. You can't get the same effect by doing call _Unwind_RaiseException() followed by unreachable except by hard-coding special knowledge of what that function does; the unreachable just indicates that the function never returns, not that it always throws (logically it would permit the code to loop forever instead).

Calling _Unwind_RaiseException() would be part of the implementation of throw on a particular platform/architecture flavour (when not optimized out), but from the perspective of optimizers that's not something they should know. I've literally no idea whether it would be the same implementation on other platforms/languages; the platform where I do low level work doesn't support exceptions for other reasons (mostly really severe shortage of code space; 32kB isn't a lot when you have much to do).

If someone wants to take this idea and run with it as their own, they're tremendously welcome to do so, with my blessing. (There isn't anything much deeper to it than what I've written above.) I simply can't be the one to follow this one up.

llvm / llvm-project

'throws' ought to be explicit IR instruction #59006