Open elichai opened 5 years ago
If there is an inherent reason why it can't be done then maybe the right way is to have some equivalent to noexcept
in the language.
a trait similar to Send/Sync that marks a function as can not panic. and then the user can either write a non panicking code or if he has access to libstd he can wrap his part of the code with catch_unwind
to make it a noexcept
function.
There's been some discussion on this today on the embedded WG's matrix chat.
Long story short, we'd like to take_mut, but that needs std for unwinding – leading up to people implementing it under the (untestable, for lack of cfg(panic=assert)
) assumption that their code is used on platforms without unwrapping.
Scratchpad material for an RFC:
A
core::panic::catch_unwind
function is added. Like withcore::fmt::Formatter
(whose error type is distinct from the one in std, but all traits it implements are also present in the std variation), this function has the simplified error typeimpl Any
(which would in practice be satisfied by()
but doesn't constrain further development).Users of
core::panic::catch_unwind
can thus take actions when a panic occurs inside the closure, but do not get any details. That is sufficient for applications liketake_mut
that merely abort.
This proposal is incomplete in two ways:
std::process::abort
in its handler. That can be worked around by adding an abort=endlessloop
feature to the take_mut
crate that'd opt it out of that use of abort into an endless loop. This would be opted in to by applications when they build on non-unwinding no_std platforms – no_std but unwinding programs would probably need to provide their own "kill this with fire" functionality there; alternatively an application-settableabort
lang item could be introduced.panic=abort
(or no_std with no unwinding whatsoever) situation, which is a desired property for the cases that were discussed. Especially, it would preclude the generation of the abovementioned code for an endless loop altogether.I don't think catch_unwind is necessary to "do something" when a panic unwinds out of a lexical scope. You can put a ZST on the stack that "does something" in its Drop impl, and defuse it with mem::forget right before it would normally be dropped (after all potentially panicking code). As far as I know, catch_unwind is only necessary when you want to stop the panic from without terminating or looping forever.
Good point, thanks. I can't say what this means for @elichai's original use cases, but mine can be more easily addressed inside take_mut with your suggestion in https://github.com/Sgeo/take_mut/issues/11.
hmmm I'm not sure I quite understood everything, my use case is basically to prevent unwinding through FFI, don't care too much about at what price (obviously i'd rather it will return an Error but aborting will also be fine)
My team has a strong need for no_std components that can catch unwinding panics.
Is anyone working on this design problem? If not, I'd like to pick this up and help design a good solution. If so, then yay, and I'd like to coordinate with anyone else who is.
On the one hand, people are concerned that crates might start to depend on panic strategies once cfg(panic = ...)
is introduced: https://github.com/rust-lang/rust/pull/74754#issuecomment-668822641
This also increases the chances of crates saying "I only work with panic = abort" or so, whereas today the lack of this cfg means that it is pretty hard to make that call as a crate author (unless in some very specialized domain, where unwinding is never permitted, but those seem like major exceptions).
On the other hand, no_std+cfg(panic = "unwind")
+unsafety forces the user to write extremely convoluted, hard to understand (and therefore even more unsafe) code UNLESS he is willing to simply ud2
or loop { }
which amounts to forcing panic = "abort"
on the surrounding code. Let's look at the following example: https://doc.rust-lang.org/nomicon/exception-safety.html#binaryheapsift_up
First, the following pseudo code is suggested:
bubble_up(heap, index):
let elem = heap[index]
while index != 0 && elem < heap[parent(index)]:
heap[index] = heap[parent(index)]
index = parent(index)
heap[index] = elem
This works fine on no_std+cfg(panic = "abort")
. But it's incorrect on cfg(panic = "unwind")
.
Then the following pseudo code is suggested:
bubble_up(heap, index):
let elem = heap[index]
try:
while index != 0 && elem < heap[parent(index)]:
heap[index] = heap[parent(index)]
index = parent(index)
finally:
heap[index] = elem
This works fine on std with any panic strategy but cannot be written on no_std.
Then we arrive at the actual solution which works on no_std with any panic strategy:
struct Hole<'a, T: 'a> {
data: &'a mut [T],
/// `elt` is always `Some` from new until drop.
elt: Option<T>,
pos: usize,
}
impl<'a, T> Hole<'a, T> {
fn new(data: &'a mut [T], pos: usize) -> Self {
unsafe {
let elt = ptr::read(&data[pos]);
Hole {
data: data,
elt: Some(elt),
pos: pos,
}
}
}
fn pos(&self) -> usize { self.pos }
fn removed(&self) -> &T { self.elt.as_ref().unwrap() }
unsafe fn get(&self, index: usize) -> &T { &self.data[index] }
unsafe fn move_to(&mut self, index: usize) {
let index_ptr: *const _ = &self.data[index];
let hole_ptr = &mut self.data[self.pos];
ptr::copy_nonoverlapping(index_ptr, hole_ptr, 1);
self.pos = index;
}
}
impl<'a, T> Drop for Hole<'a, T> {
fn drop(&mut self) {
// fill the hole again
unsafe {
let pos = self.pos;
ptr::write(&mut self.data[pos], self.elt.take().unwrap());
}
}
}
impl<T: Ord> BinaryHeap<T> {
fn sift_up(&mut self, pos: usize) {
unsafe {
// Take out the value at `pos` and create a hole.
let mut hole = Hole::new(&mut self.data, pos);
while hole.pos() != 0 {
let parent = parent(hole.pos());
if hole.removed() <= hole.get(parent) { break }
hole.move_to(parent);
}
// Hole will be unconditionally filled here; panic or not!
}
}
}
I'm surprised that this is still considered acceptable in 2020. I remember people spending days trying to optimize Vec
methods in 2014 because the obvious and fast implementation would be unsafe under unwinding and catch_unwind
was not available.
Of course people will then depend on cfg(panic = "abort")
because they don't want to write unwieldy code like the above only to support a use case that 99% of crates don't care for (because unwinding terminates the program either way.) The whole concept of unwinding without catch_unwind
violates the "What you don't use, you don't pay for" principle. Mostly because your code has to be much more complicated but also because "cleanup in Drop" tends to generate worse code than the equivalent catch_unwind
+ resume_unwind
.
I have a working prototype of this: https://github.com/dpaoliello/rust/commit/a1f3c08995654f222b768a1efebdf80539b66ab9
It required a few changes:
core::panic::catch_unwind
returns Result<R, ()>
since there is no Box
in libcore.__rust_panic_cleanup_and_drop
extern function in libcore that is used to cleanup whatever state is allocated while throwing the exception (this is similar to how __rust_panic_cleanup
cleans up that state and then returns the "reason" in a Box
).__rust_panic_cleanup_and_drop
only needs to be defined if catch_unwind
is called, which means that this is pay-for-play and will not break existing no_std
code. (This is a side-effect of __rust_panic_cleanup_and_drop
only being called in a generic)panic-abort
and panic-unwind
declare __rust_panic_cleanup_and_drop
, so if you are in a libcore+liballoc environment then you can still use these for panicking, and catch_unwind
will work correctly.rust_panic_cleanup_and_drop only needs to be defined if catch_unwind is called, which means that this is pay-for-play and will not break existing no_std code. (This is a side-effect of rust_panic_cleanup_and_drop only being called in a generic)
So if you try to use catch_unwind
in a no_std apllication you will get an opaque linker error? Which is bad UX. Or it may even happen to work if the use of catch_unwind
is not reachable and the monomorphication collector decided to not codegen the caller of catch_unwind
? That is pretty fragile and may make changes to the monomorphication collector breaking changes.
rust_panic_cleanup_and_drop only needs to be defined if catch_unwind is called, which means that this is pay-for-play and will not break existing no_std code. (This is a side-effect of rust_panic_cleanup_and_drop only being called in a generic)
So if you try to use
catch_unwind
in a no_std apllication you will get an opaque linker error? Which is bad UX. Or it may even happen to work if the use ofcatch_unwind
is not reachable and the monomorphication collector decided to not codegen the caller ofcatch_unwind
? That is pretty fragile and may make changes to the monomorphication collector breaking changes.
I agree that it isn't a great UX and is fragile.
Maybe a langitem
can be used here? I wasn't sure how to 1) make the langitem
only required if catch_unwind
is called and 2) be able to call the langitem
from within catch_unwind
(perhaps add an intrinsic?)
My point with this prototype is to prove that we can do this, and to figure out what issues need to be solved for a solution: it seems like implementing this cleanup function in a non-breaking change fashion is the biggest hurdle, but that everything else should "just work".
I think FFI is a very common thing people do in
no-std
environments. a common thing in C/FFI is passing function pointers as callback, this currently cannot happen safely without acatch_unwind
wrapper, otherwise it will panic through FFI boundaries.
I'm no expert, but based on just this text, is extern "C-unwind"
enough?
I think FFI is a very common thing people do in
no-std
environments. a common thing in C/FFI is passing function pointers as callback, this currently cannot happen safely without acatch_unwind
wrapper, otherwise it will panic through FFI boundaries.I'm no expert, but based on just this text, is
extern "C-unwind"
enough?
Yes and no.
If you are passing a Rust function as a callback to a foreign function, then using C-unwind
would permit the panic to escape the Rust callback, rip through the foreign function and back into the Rust code. If the foreign function can handle exceptions being thrown by a callback (and, more specifically, the same type of exception that Rust throws), then there is no issue here.
On the other hand, if the foreign function cannot handle exceptions being thrown from a callback (which is the more common case, as the C ABI has no concept of exceptions, thus throwing across the ABI is strongly discouraged/undefined behavior), then using catch_unwind
at the top-level of the callback allows any panics to safely unwind the Rust stack, pass some sort of error code to the foreign function, which will return an error back to the original Rust code (which can then deal with the error code or call resume_unwind
to keep panicking).
Without catch_unwind
there is no way to unwind during panicking AND avoid the panic from escaping Rust.
Experimenting with this a bit more, I wonder if we should add a core::panic::catch_unwind
to the panic_unwind
crate: this would allow us to keep the function signature the same (since panic_unwind
requires liballoc) and would make it useful (since there would be a throwing panic and unwinding support).
- As suggested,
core::panic::catch_unwind
returnsResult<R, ()>
since there is noBox
in libcore.
Maybe also implement a version that auto-leaks the box, and returns Result<R, &'static mut dyn Any>
? Could be useful if someone would still like to inspect the panic payload, and doesn't care about the possible leak.
Hi, I think FFI is a very common thing people do in
no-std
environments. a common thing in C/FFI is passing function pointers as callback, this currently cannot happen safely without acatch_unwind
wrapper, otherwise it will panic through FFI boundaries.It looks like the only reason this can't happen is because of the Box in the result
Result<R, Box<dyn Any + Send>>
. am I right? is there some inherent reason why it can't happen?