Moving `catch_unwind` to libcore

elichai commented 5 years ago

Hi, I think FFI is a very common thing people do in no-std environments. a common thing in C/FFI is passing function pointers as callback, this currently cannot happen safely without a catch_unwind wrapper, otherwise it will panic through FFI boundaries.

It looks like the only reason this can't happen is because of the Box in the result Result<R, Box<dyn Any + Send>>. am I right? is there some inherent reason why it can't happen?

elichai commented 5 years ago

If there is an inherent reason why it can't be done then maybe the right way is to have some equivalent to noexcept in the language. a trait similar to Send/Sync that marks a function as can not panic. and then the user can either write a non panicking code or if he has access to libstd he can wrap his part of the code with catch_unwind to make it a noexcept function.

chrysn commented 4 years ago

There's been some discussion on this today on the embedded WG's matrix chat.

Long story short, we'd like to take_mut, but that needs std for unwinding – leading up to people implementing it under the (untestable, for lack of cfg(panic=assert)) assumption that their code is used on platforms without unwrapping.

Scratchpad material for an RFC:

A core::panic::catch_unwind function is added. Like with core::fmt::Formatter (whose error type is distinct from the one in std, but all traits it implements are also present in the std variation), this function has the simplified error type impl Any (which would in practice be satisfied by () but doesn't constrain further development).

Users of core::panic::catch_unwind can thus take actions when a panic occurs inside the closure, but do not get any details. That is sufficient for applications like take_mut that merely abort.

This proposal is incomplete in two ways:

It does not specify how custom unwinders on no_std systems make their way in there; I don't know whether we need to say that in here.
It is insufficient to implement a no_std-compatible take_mut crate, as it calls std::process::abort in its handler. That can be worked around by adding an abort=endlessloop feature to the take_mut crate that'd opt it out of that use of abort into an endless loop. This would be opted in to by applications when they build on non-unwinding no_std platforms – no_std but unwinding programs would probably need to provide their own "kill this with fire" functionality there; alternatively an application-settableabort lang item could be introduced.
It does not answer whether such a catch_unwind is zero-cost in a panic=abort (or no_std with no unwinding whatsoever) situation, which is a desired property for the cases that were discussed. Especially, it would preclude the generation of the abovementioned code for an endless loop altogether.

hanna-kruppe commented 4 years ago

I don't think catch_unwind is necessary to "do something" when a panic unwinds out of a lexical scope. You can put a ZST on the stack that "does something" in its Drop impl, and defuse it with mem::forget right before it would normally be dropped (after all potentially panicking code). As far as I know, catch_unwind is only necessary when you want to stop the panic from without terminating or looping forever.

chrysn commented 4 years ago

Good point, thanks. I can't say what this means for @elichai's original use cases, but mine can be more easily addressed inside take_mut with your suggestion in https://github.com/Sgeo/take_mut/issues/11.

elichai commented 4 years ago

hmmm I'm not sure I quite understood everything, my use case is basically to prevent unwinding through FFI, don't care too much about at what price (obviously i'd rather it will return an Error but aborting will also be fine)

sivadeilra commented 4 years ago

My team has a strong need for no_std components that can catch unwinding panics.

Is anyone working on this design problem? If not, I'd like to pick this up and help design a good solution. If so, then yay, and I'd like to coordinate with anyone else who is.

mahkoh commented 4 years ago

On the one hand, people are concerned that crates might start to depend on panic strategies once cfg(panic = ...) is introduced: https://github.com/rust-lang/rust/pull/74754#issuecomment-668822641

This also increases the chances of crates saying "I only work with panic = abort" or so, whereas today the lack of this cfg means that it is pretty hard to make that call as a crate author (unless in some very specialized domain, where unwinding is never permitted, but those seem like major exceptions).

On the other hand, no_std+cfg(panic = "unwind")+unsafety forces the user to write extremely convoluted, hard to understand (and therefore even more unsafe) code UNLESS he is willing to simply ud2 or loop { } which amounts to forcing panic = "abort" on the surrounding code. Let's look at the following example: https://doc.rust-lang.org/nomicon/exception-safety.html#binaryheapsift_up

First, the following pseudo code is suggested:

bubble_up(heap, index):
    let elem = heap[index]
    while index != 0 && elem < heap[parent(index)]:
        heap[index] = heap[parent(index)]
        index = parent(index)
    heap[index] = elem

This works fine on no_std+cfg(panic = "abort"). But it's incorrect on cfg(panic = "unwind").

Then the following pseudo code is suggested:

bubble_up(heap, index):
    let elem = heap[index]
    try:
        while index != 0 && elem < heap[parent(index)]:
            heap[index] = heap[parent(index)]
            index = parent(index)
    finally:
        heap[index] = elem

This works fine on std with any panic strategy but cannot be written on no_std.

Then we arrive at the actual solution which works on no_std with any panic strategy:

struct Hole<'a, T: 'a> {
    data: &'a mut [T],
    /// `elt` is always `Some` from new until drop.
    elt: Option<T>,
    pos: usize,
}

impl<'a, T> Hole<'a, T> {
    fn new(data: &'a mut [T], pos: usize) -> Self {
        unsafe {
            let elt = ptr::read(&data[pos]);
            Hole {
                data: data,
                elt: Some(elt),
                pos: pos,
            }
        }
    }

    fn pos(&self) -> usize { self.pos }

    fn removed(&self) -> &T { self.elt.as_ref().unwrap() }

    unsafe fn get(&self, index: usize) -> &T { &self.data[index] }

    unsafe fn move_to(&mut self, index: usize) {
        let index_ptr: *const _ = &self.data[index];
        let hole_ptr = &mut self.data[self.pos];
        ptr::copy_nonoverlapping(index_ptr, hole_ptr, 1);
        self.pos = index;
    }
}

impl<'a, T> Drop for Hole<'a, T> {
    fn drop(&mut self) {
        // fill the hole again
        unsafe {
            let pos = self.pos;
            ptr::write(&mut self.data[pos], self.elt.take().unwrap());
        }
    }
}

impl<T: Ord> BinaryHeap<T> {
    fn sift_up(&mut self, pos: usize) {
        unsafe {
            // Take out the value at `pos` and create a hole.
            let mut hole = Hole::new(&mut self.data, pos);

            while hole.pos() != 0 {
                let parent = parent(hole.pos());
                if hole.removed() <= hole.get(parent) { break }
                hole.move_to(parent);
            }
            // Hole will be unconditionally filled here; panic or not!
        }
    }
}

I'm surprised that this is still considered acceptable in 2020. I remember people spending days trying to optimize Vec methods in 2014 because the obvious and fast implementation would be unsafe under unwinding and catch_unwind was not available.

Of course people will then depend on cfg(panic = "abort") because they don't want to write unwieldy code like the above only to support a use case that 99% of crates don't care for (because unwinding terminates the program either way.) The whole concept of unwinding without catch_unwind violates the "What you don't use, you don't pay for" principle. Mostly because your code has to be much more complicated but also because "cleanup in Drop" tends to generate worse code than the equivalent catch_unwind + resume_unwind.

dpaoliello commented 3 years ago

I have a working prototype of this: https://github.com/dpaoliello/rust/commit/a1f3c08995654f222b768a1efebdf80539b66ab9

It required a few changes:

As suggested, core::panic::catch_unwind returns Result<R, ()> since there is no Box in libcore.
I had to add a new __rust_panic_cleanup_and_drop extern function in libcore that is used to cleanup whatever state is allocated while throwing the exception (this is similar to how __rust_panic_cleanup cleans up that state and then returns the "reason" in a Box).
__rust_panic_cleanup_and_drop only needs to be defined if catch_unwind is called, which means that this is pay-for-play and will not break existing no_std code. (This is a side-effect of __rust_panic_cleanup_and_drop only being called in a generic)
panic-abort and panic-unwind declare __rust_panic_cleanup_and_drop, so if you are in a libcore+liballoc environment then you can still use these for panicking, and catch_unwind will work correctly.

bjorn3 commented 3 years ago

rust_panic_cleanup_and_drop only needs to be defined if catch_unwind is called, which means that this is pay-for-play and will not break existing no_std code. (This is a side-effect of rust_panic_cleanup_and_drop only being called in a generic)

So if you try to use catch_unwind in a no_std apllication you will get an opaque linker error? Which is bad UX. Or it may even happen to work if the use of catch_unwind is not reachable and the monomorphication collector decided to not codegen the caller of catch_unwind? That is pretty fragile and may make changes to the monomorphication collector breaking changes.

dpaoliello commented 3 years ago

rust_panic_cleanup_and_drop only needs to be defined if catch_unwind is called, which means that this is pay-for-play and will not break existing no_std code. (This is a side-effect of rust_panic_cleanup_and_drop only being called in a generic)

So if you try to use catch_unwind in a no_std apllication you will get an opaque linker error? Which is bad UX. Or it may even happen to work if the use of catch_unwind is not reachable and the monomorphication collector decided to not codegen the caller of catch_unwind? That is pretty fragile and may make changes to the monomorphication collector breaking changes.

I agree that it isn't a great UX and is fragile.

Maybe a langitem can be used here? I wasn't sure how to 1) make the langitem only required if catch_unwind is called and 2) be able to call the langitem from within catch_unwind (perhaps add an intrinsic?)

My point with this prototype is to prove that we can do this, and to figure out what issues need to be solved for a solution: it seems like implementing this cleanup function in a non-breaking change fashion is the biggest hurdle, but that everything else should "just work".

shepmaster commented 3 years ago

I think FFI is a very common thing people do in no-std environments. a common thing in C/FFI is passing function pointers as callback, this currently cannot happen safely without a catch_unwind wrapper, otherwise it will panic through FFI boundaries.

I'm no expert, but based on just this text, is extern "C-unwind" enough?

dpaoliello commented 3 years ago

I think FFI is a very common thing people do in no-std environments. a common thing in C/FFI is passing function pointers as callback, this currently cannot happen safely without a catch_unwind wrapper, otherwise it will panic through FFI boundaries.

I'm no expert, but based on just this text, is extern "C-unwind" enough?

Yes and no.

If you are passing a Rust function as a callback to a foreign function, then using C-unwind would permit the panic to escape the Rust callback, rip through the foreign function and back into the Rust code. If the foreign function can handle exceptions being thrown by a callback (and, more specifically, the same type of exception that Rust throws), then there is no issue here.

On the other hand, if the foreign function cannot handle exceptions being thrown from a callback (which is the more common case, as the C ABI has no concept of exceptions, thus throwing across the ABI is strongly discouraged/undefined behavior), then using catch_unwind at the top-level of the callback allows any panics to safely unwind the Rust stack, pass some sort of error code to the foreign function, which will return an error back to the original Rust code (which can then deal with the error code or call resume_unwind to keep panicking).

Without catch_unwind there is no way to unwind during panicking AND avoid the panic from escaping Rust.

dpaoliello commented 3 years ago

Experimenting with this a bit more, I wonder if we should add a core::panic::catch_unwind to the panic_unwind crate: this would allow us to keep the function signature the same (since panic_unwind requires liballoc) and would make it useful (since there would be a throwing panic and unwinding support).

noamtashma commented 2 years ago

As suggested, core::panic::catch_unwind returns Result<R, ()> since there is no Box in libcore.

Maybe also implement a version that auto-leaks the box, and returns Result<R, &'static mut dyn Any>? Could be useful if someone would still like to inspect the panic payload, and doesn't care about the possible leak.

rust-lang / rfcs

Moving `catch_unwind` to libcore #2810