Panic safety (or lack thereof)

nyurik commented 7 months ago

Would it be possible to refactor the code to place all unsafe code within one crate? Similar to how other crates keep FFI bindings in -sys, where as the rest of complex code is in pure safe Rust, without any safety concerns? This is more of a hypothetical question at this stage, but if this goal is set early on, it might be possible to move towards that.

The second aspect is panics - seems panics is the default method for most PGRX operations, whereas typically .unwrap() is a sign that an error is not being properly handled, and may result in a runtime error. Should gradual migration towards the Result<T,E> be preferred, and what would be needed for this?

I am a newcomer to the codebase, so please take it with a grain of salt. Thx!

workingjubilee commented 7 months ago

Would it be possible to refactor the code to place all unsafe code within one crate? Similar to how other crates keep FFI bindings in -sys, where as the rest of complex code is in pure safe Rust, without any safety concerns? T

I'm not sure I understand this, because a safe function can be internally unsafe, but -sys crates are generally externally unsafe because they are pure bindings.

Consider pgrx::list::List: the main way to obtain one is to call an unsafe fn that then bears the safety assertion for the rest of the type's implementation. This is partially because https://github.com/rust-lang/unsafe-code-guidelines/issues/256 hasn't been resolved decisively, but this is deeply inherent in safe wrappers.

workingjubilee commented 7 months ago

I think the unsafe thing is not very actionable, but we could pervasively document what panics, for a start. I think we have many cases where we risk a panic, knowing it, and then don't have # Panic. There are cases where it's only known from accessing Postgres, though.

nyurik commented 7 months ago

Thanks @workingjubilee ! Could you elaborate on your second point about accessing Postgress panics? Do you mean Postgres could cause a panic in C code, and the Rust code simply ignores it and passes it through until some code up the stack handles it? If so, we could move the panic catching logic to wrap the FFI code that calls C function - and only propagate a proper Result thereafter - again, assuming I understood the issue. And if so, the rest of the Rust-specific panics could be converted to the proper error result values as well.

workingjubilee commented 7 months ago

Thanks @workingjubilee ! Could you elaborate on your second point about accessing Postgress panics? Do you mean Postgres could cause a panic in C code, and the Rust code simply ignores it and passes it through until some code up the stack handles it?

Postgres has an error-handling context, it doesn't use unwinding. Instead it "throws" using longjmp. Some functions are known to throw errors. In some cases, when we intercept these errors, they are things we should unconditionally panic on, rather than attempting to catch anything, because the error is ABORT or something like that.

You might find looking at this informative:

nyurik commented 7 months ago

thx for the links, but i must still be misunderstanding the meaning of "we should unconditionally panic on" -- at the end of the day, panics and error results are nearly identical in the sense that both can be caught and handled by the calling function. The results are just "cleaner". So there are these cases:

PG calls a PGRX Rust callback function which calls a bunch of other Rust functions. Some Rust function panics, and that panic is caught by the callback function and is passed to PG via some proper error reporting API. In this case, I don't see any reason to use panics - but instead can all use standard Rust Results.
PG calls a PGRX Rust callback that calls Rust code that in turn calls internal PG functions via PRGX FFI again. In this case, PG could throw some longjump exception that PGRX FFI could probably unwind / re-panic / pass back to the initial PGRX callback function, which will convert that panic back to the error API(?).
I don't think there is a case when the app's main fn is written in Rust, that calls into PG code (e.g. reusing psql or other pg C library) - so there is no point in covering it just yet.

workingjubilee commented 7 months ago

thx for the links, but i must still be misunderstanding the meaning of "we should unconditionally panic on" -- at the end of the day, panics and error results are nearly identical in the sense that both can be caught and handled by the calling function

One of the error levels that Postgres uses indicates that data corruption is occurring or could occur if the program proceeds further, rescue is not possible, and everything should terminate essentially immediately. It may still run certain termination handlers nonetheless (like unwinding would).

So, no. I agree that we may want to make it easier to handle certain possibly-panicking functions, but mostly because if we do things that encourage people to roll their own logic for catching unwinds in Rust, they will incorrectly try to "handle" an error that should never be handled.

nyurik commented 7 months ago

I absolutely agree that there is a very small (?) class of errors that should bypass the Result structs using Rust's panics. My point is that this should only affect the second case: PG -> Rust -> PG. And I also agree that we should make it extra difficult for the users to capture such cases. For everything else, I think a standard Rust Result<T, E> should be used, which means there should be almost no .unwrap() or any other panic-creating calls in the PGRX. Is my understanding corect?

pgcentralfoundation / pgrx

Panic safety (or lack thereof) #1543