significant sigsetjmp overhead due to pg utility functions being guarded by pg_guard

EdMcBane commented 2 years ago

I'm writing an FDW using pgx and profiling it i've noticed some significant time spent in __sigsetjmp and friends. I gather this is due to postgres utility functions exposed by pgx_sys and heavily used by my extension, such as cstring_to_text_with_len being annotated with the pg_guard attribute and thus going through all the setjmp / catch_unwind magic for rust-postgres error handling.

I'm supposing this is required to ensure postgres errors are caught on the FFI boundary and thus the rust stack can be properly unwound.

I wonder whether there is some possible workaround; I'd be fine with aborting on panic so the stack not being unwound would not be such an issue.

workingjubilee commented 2 years ago

Background Information on Unwinding into FFI

Incorrect handling of unwinding with respect to an FFI barrier is undefined behavior in Rust, per the current caution in the reference. In addition, it is currently one of the very few ways to potentially expose Safe Rust to undefined behavior. Some auditing of the unwind-handling strategy of pgx will likely be required as implementation of the "C-unwind" ABI progresses.

So... can it be faster?

This overhead is probably actually-in-fact possible to minimize in some way nonetheless, with some cleverness! I am just noting this is by necessity Extra Spicy Code, especially given how it interacts with Postgres, so any improvement might be via shuffling the unsafe invariants around (and, say, into user code) rather than simply saying "eh, just abort instead".

Or maybe just aborting instead is fine. I don't know yet.

thomcc commented 2 years ago

Once C-unwind stabilizes, we can switch to a cheaper approach. Actually, once that stabilizes, the current code switches from "cool hack to dodge some unsoundness" to being UB (unless we switch all the ABIs to extern "C-unwind") which is... annoying. (That said we just can do a version check in build.rs and change the API there, so it's not the end of the world)

I assume this kind of error happens often enough that handling these by aborting is not viable, right? That might grant some other ways to handle these soundly, I think.

thomcc commented 2 years ago

After a conversation with @eeeebbbbrrrr this morning, I've been letting thoughts about how to avoid this overhead percolate in the back of my mind to see if it comes up with anything remotely viable (beyond "I think it's possible" and "maybe some asm shim idk"), and I have a crazy scheme that certainly needs much more refinement, but might be the start of an idea that could work.

Basically: LLVM has support for implementing stack unwinding for implementing exceptions/unwinding on top of setjmp/longjmp[^1]. Currently (I think) rustc only uses this on 32bit ARM iOS, via LLVM's bundled libunwind implementation, but LLVM has the capability do it via other implementations as well (this may not be needed though, if we just modify the source).

[^1]: The way this works (to oversimplify greatly) is that entering/exiting a stack frame may push/pop dtors on/off a linked list. try, catch_unwind, etc are mapped to (wrappers around) setjmp, and throw, panic, etc are mapped to (wrappers around) longjmp. When unwinding occurs, it longjmps up, and then calls the dtors that are currently on the linked list, stopping when they were added before the setjmp and the longjmp.

Basically the crazy idea is (under some postgrestd-like environment where we control the build and are willing to do Build Crimes), we could force Rustc to use this SJLJ unwinding, and make it call our code instead (we'd write wrappers around the existing libunwind SJLJ functions). When libunwind-sjlj would setjmp we also put a pointer to it's jmpbuf on the PG_exception_stack (if it's on the postgres main thread). Then, if postgres longjmps, it will land there -- we'd then determine that this happened, and perform semi-normal cleanup as if it were a Rust panic, more or less.

In other words: The idea is basically to do a minimal shim so that we can turn postgres longjmp (which is normally a forced unwind) into a normal stack-cleaning-up unwind. Doing this would make PG longjmps over Rust behave equivalently to panics, and fix the UB (which comes from doing forced-unwind). This seems likely to be possible because PG is using SJLJ and LLVM has support for implementing unwinding in terms of that, and we can modify the source of the code LLVM uses for it (even though we can't modify postgres.

The nice thing about this is, avoids and avoid paying a large of cost except when the error happens -- we'd no longer need to do anything special when entering/returning pgx-pg-sys (besides mark the functions as "C-unwind").

Major caveat: I am doing a ton of handwaving here! This is a very very rough sketch that ignores the issues around the fact that now rustc and pg both have the same jmpbuf (need to properly handle rust code panicking), thread-safety (need to only do this on the pg thread), or pinning (not even sure if it's relevant).

All that has to be figured out, and I haven't looked at it at all! (None of it sounds too intractable in the postgrestd-style environment, though)

pgcentralfoundation / pgrx

significant sigsetjmp overhead due to pg utility functions being guarded by pg_guard #356

Background Information on Unwinding into FFI

So... can it be faster?