Open SimonSapin opened 6 years ago
I'm also a bit stuck due to this not being stabilized. I'm surprised there aren't more no_std
people bringing this up. Maybe everyone uses nightly for other reasons anyway.
@NickeZ It's been brought up every now and then, e.g. just now in our Embedded WG meeting. ;)
It seems like something simple that should be able to stabilize practically "today"?
@rust-lang/lang @rust-lang/libs How would you feel about making #[alloc_error_handler]
default to panic? That is, when libstd (which defines a #[alloc_error_handler]
) is not linked and no other crate defines a handler, do as if this was defined:
#[alloc_error_handler]
fn default(layout: core::alloc::Layout) -> ! {
panic!("memory allocation of {} bytes failed", layout.size());
}
(This panic message matches what libstd’s default allocation error hook (https://github.com/rust-lang/rust/issues/51245) prints before aborting the process.)
This would unblock no_std
+ alloc
application from running on the stable channel, without stabilizing this attribute.
Panicking is different from libstd’s behavior of aborting the process, but in a no_std
context there may not even be a process to abort. It seems to me that panicking is a reasonable default in that case. #[panic_handler]
gets called, which is stable and quite similar to #[alloc_error_handler]
as it exists as unstable today.
It seems strange to be aborting with std and panicking without. Should we just start panicking in std as well? IIRC that's a change people feel can be made?
As I understand it:
isize::max_value()
on accident) and it's totally actually okay to make a small allocation.So if libstd were to panic instead of immediately abort, and assuming that panic="unwind"
for that build, the process as a whole might be able to survive an OOM in cases where it previously did not.
PS: From a language stability perspective, it seems entirely future-compatible for the story to be: "for now, you can't define the OOM handler. You always get this default handler in no_std
which will just panic. In the future you'll have the option to define your own handler, or you could of course continue to allow the default handler to be used".
I like @SimonSapin's proposal. In my usecases, an OOM can always be handled by invoking something directly in alloc
, rather than returning null
, so they are functionally equivalent. Having alloc + no_std on stable would be fantastic.
@Lokathor IIRC the main issue was not just that, but the possibility of there being unsafe
code which is sound in the presence of abort-on-OOM but unsound given panic-on-OOM. That is, unsafe
code authors heretofore did not have to consider panic safety every time they called a potentially-allocating function, and now they would (but the code is already written).
[I didn't and don't have any opinions on this and don't want to relitigate it, just as a FYI.]
@sfackler Yeah it’s inconsistent. But maybe acceptable because there may not be a process to abort anyway on some embedded targets, and this issue tracks a way to eventually customize the behavior. Panicking is just the default.
FWIW we have an accepted RFC (but unimplemented, and without a precise mechanism specified) to allow opting into panic instead of abort (in libstd) on allocation errors: https://github.com/rust-lang/rust/issues/43596
Changing the default in libstd seems potentially more risky to me, as @glaebhoerl mentions.
How would you feel about making #[alloc_error_handler] default to panic?
Since panic!
is allowed to allocate memory (and does so for -C panic=unwind
), what happens if that fails within the #[alloc_error_handler]
? Does it get called again from within the panic!
in an infinite recursion? (or does it have a way to detect whether such recursion is happening?).
I covered this above (https://github.com/rust-lang/rust/issues/51540#issuecomment-553211363), a panic during a panic goes into an abort.
For the panic during a panic case to trigger, the first panic must "succeed", but that doesn't happen if the panic fails to allocate memory, right?
EDIT: e.g. if the panic fails to allocate, then the stack is not unwound, yet we call the #[alloc_error_handler]
which will try to panic again, which won't detect a double-panic because we are not unwinding the stack yet. I suppose we can add extra logic to panic!
to handle #[alloc_error_handler]
recursion, maybe by shuffling around the double-panic detection code? (I don't recall if we just call libunwind and ask "are we panicking" or keep extra state around for that already).
std::panic
is not a reexport of core::panic
, they’re two different macros. The latter does not allocate.
It could panic while trying to allocate the string for format!
, which happens before calling the panic machinery. Therefore you need explicit detection for recursion in the alloc handler.
EDIT: Ah nevermind, I just checked the code and the formatting happens after checking for a double panic, so everything should be fine.
@SimonSapin Maybe I misunderstood you? I thought you were proposing for #[alloc_error_handler]
to also panic for std
builds as well, where std::panic
is used, and std::panic
tries to allocate. For core::panic
, it might or might not allocate, depending on how the user implements the panic handler, but today they can allocate if they have access to liballoc
.
My initial proposal in https://github.com/rust-lang/rust/issues/51540#issuecomment-553114657 was to default to panic only when no other alloc_error_handler
is provided. Since libstd provides such a handler, this new default would only be used when libstd is not linked.
By necessity, this default would use core::panic
rather than std::panic
. The former calls #[panic_handler]
without allocating.
Then, if a user-provided #[panic_handler]
tries to allocate and that fails, yes we’d get a double panic. But that’s not the only case where a user-provided #[panic_handler]
can double-panic.
In a later comment https://github.com/rust-lang/rust/issues/51540#issuecomment-553207208, @sfackler proposed to also change libstd’s default to panic, for consistency.
My initial proposal in #51540 (comment) was to default to panic only when no other alloc_error_handler is provided.
Ah, that makes sense.
In a later comment #51540 (comment), @sfackler proposed to also change libstd’s default to panic, for consistency.
I was expressing concern about this, but as @Amanieu mentioned, this currently allocates after checking for double panics, so this isn't an issue.
IIRC the main issue was not just that, but the possibility of there being unsafe code which is sound in the presence of abort-on-OOM but unsound given panic-on-OOM. That is, unsafe code authors heretofore did not have to consider panic safety every time they called a potentially-allocating function, and now they would (but the code is already written).
Example:
// Invariant: ptr always points to some allocated memory of len > 0
struct MyDynArray { ptr: *mut u8, len: usize }
impl MyDynArray {
pub fn new() { Self { ptr: GlobalAlloc(..1..), len: 1 } }
pub fn resize(&mut self, val: u8, new_len: usize) { unsafe {
// this unsafe code is sound today, but not if allocation can panic
let new_len = if new_len == 0 { 1 } else { new_len };
let ptr = self.ptr;
self.ptr = ptr::null();
let new_ptr = GlobalAlloc::alloc(...new_len...);
// move elements from [ptr, ptr + size] to [new_ptr, new_ptr + size]
GlobalAlloc::dealloc(ptr);
self.ptr = new_ptr;
self.len = new_len;
}}
}
// safe because this will never see a `self.ptr == ptr::null()`
impl Drop for MyDynArray { fn drop(&mut self) { unsafe { GlobalAlloc::dealloc(self.ptr) } }}
This code is sound because either the call to GlobalAlloc::alloc(...new_len...)
succeeds or it aborts, which means that Drop::drop
will never see a MyDynArray
with self.ptr == ptr::null()
.
If GlobalAlloc::alloc
is changed to unwind on OOM, then such code now has a memory safety issue. While the code can be "fixed" to avoid that, such code might already be written, and we can't easily tell.
Basically, that GlobalAlloc::alloc
does not unwind on OOM is a guarantee of its API that we can't break. We could make #[alloc_error_handler]
panic by default, and add a catch_unwind
to GlobalAlloc::alloc
to make it abort
. For Alloc::alloc
, we could just change its semantics since they are unstable, and maybe we can add a different "system allocator" hook with "panics on OOM" semantics that people can migrate to, and deprecate the old one.
Is unwinding possible when libstd is not linked?
Is unwinding possible when libstd is not linked?
Yes, if you use nightly, port libpanic_unwind
and libunwind
to your platform, and invoke them directly without going through libstd's panic framework. Obviously this relies on very unstable implementation details.
@gnzlbg your example (1) could easily be fixed in the presence of OOM leading to unwind (2) doesn't even have correct usage for the api that GlobalAlloc
exposes (3) doesn't call alloc::alloc::handle_alloc_error
, which is how you'd go to the allocation handler that would either abort or unwind.
To be clear: your code example is not sound today.
If a crate is designed to work with no_std, wouldn't there authors have to consider the possibility that abort does not work? Am I misunderstanding
Today, a person can just write a no_std
lib, and you call the lib and it parses bytes or does a formula or whatever. Many people use no_std
in its literal meaning: without the OS backed standard library. They still assume that there's something out there beyond the process. It's unfortunate, but that's the social contract we've developed.
I guess your panic_handler should trap the thread in a loop or something if it's really unable to exit to a wider OS. That's what I do for GBA. (as a reminder: not an empty loop
because those are UB because of LLVM bug, do a volatile read something over and over so that there's a side effect)
(as a reminder: not an empty loop because those are UB because of LLVM bug, do a volatile read something over and over so that there's a side effect)
loop { continue; }
works just fine. 😅
Surely loop { continue; }
would compile to the same LLVM IR as loop {}
(even the same MIR)?
"Works just fine" is not applicable to UB, almost any UB "works just fine" some of the time, but the risk here is lurking miscompilations.
Surely loop { continue; } would compile to the same LLVM IR as loop {} (even the same MIR)?
No idea. Funny enough I can't reproduce the problem anymore but using loop { continue; }
has never failed me on thumbv6m-none-eabi
(and higher) while loop {}
often has.
It seems like a bug if loop {}
and loop { continue; }
produce different code...
This is a discussion for https://github.com/rust-lang/rust/issues/28728, please keep this thread about alloc_error_handler
.
@gnzlbg your example (2) doesn't even have correct usage for the api that GlobalAlloc exposes (3) doesn't call alloc::alloc::handle_alloc_error, which is how you'd go to the allocation handler that would either abort or unwind.
To be clear: your code example is not sound today.
I disagree. This is the example with the blanks expanded (playground) and I believe it is correct and sound today. Changing handle_alloc_error
to panic instead, would make this program unsound, and would make it memory unsafe.
Please point out explicitly what you believe to be unsound in this use of the GlobalAlloc
API. AFAICT, the code only relies on guarantees that the API provides.
(1) could easily be fixed in the presence of OOM leading to unwind
Do you have a way to automatically fix or migrate all of existing code that might run into the family of issues that the example above demonstrates ?
well sure it works if you fill in all the checks you skipped before XD
also, just as a reminder: we already have an accepted RFC to make oom=panic
a Cargo config option: https://github.com/rust-lang/rust/issues/43596
@gnzlbg This example is only sound on the assumption that handle_alloc_error
never unwinds. But I don’t think this assumption is valid. https://doc.rust-lang.org/1.39.0/std/alloc/fn.handle_alloc_error.html documents:
The default behavior of this function is to print a message to standard error and abort the process. It can be replaced with
set_alloc_error_hook
andtake_alloc_error_hook
.
(Emphasis mine.)
I feel that we can probably get away with making OOM unwind by claiming that any unsafe code made unsound by this change was always broken.
I personally feel that unwinding on OOM (even on std) is the right approach: 99% of OOMs are caused by excessively large allocations rather than actually running out of memory. Even if unwinding requires allocating memory, this will usually work fine. A double panic and abort will handle the (rare) recursive OOM case.
I feel that we can probably get away with making OOM unwind by claiming that any unsafe code made unsound by this change was always broken.
I agree.
If I'm reading the thread correctly, the current proposal to get alloc
+ no_std
working on stable is to provide a default alloc_error_handler
that simply calls panic_handler
, which will allow people to use alloc
in a no_std
environment on stable for now while the #[alloc_error_handler]
discussion continues.
The panic handler that is implemented in #[panic_handler]
should take care not to make any allocations to avoid infinite panic recursions.
This seems to work for my usecases, which involve embedded systems running without an OS (or processes that are running with an OS that isn't supported by std), where we can do everything from writing an LED to indicate failure, to sending a message to the operating system to terminate the process with an "OOM" error.
What is needed to reach a consensus to stabilize this?
A few thoughts on moving this forward:
Also, one other thought: any interest in a very slightly more general mechanism that would avoid further language work for future things like this, along these lines? (Naming intentionally WIP.)
#[global_thing("alloc_error")]
fn alloc_error_panic(...) -> ! { ... }
// Elsewhere
#[global_thing_get("alloc_error")]
fn alloc_error(...) -> !;
... alloc_error(...) ...
This would give a compile time error if any code using a given global thing gets compiled in and that global thing doesn't get set exactly once.
Thoughts on this? (This is intentionally much simpler than the RFC for external existentials.)
Can we confirm that consensus?
The formal way to do this is an RFC for the Lang and Libs team. Or do you feel an FCP (in a dedicated issue) would be enough?
I don't think we want to immediately move to stabilize
This is an interesting case for unstable experimentation because it is about a default behavior when nothing other behavior is opted-in, and affects not just one crate but an executable/staticlib/cdylib as a whole.
Maybe we could have that "default" only apply if any of the crates in the dependency graph has a given #![feature(something)]
opt-in?
a very slightly more general mechanism that would avoid further language work
#[alloc_error_handler]
and #[panic_handler]
are very similar to each other as far as the language is concerned. I feel that the additional language work for another one like these and the number of them we’re likely to ever need are low enough that it’s not worth a general mechanism for the purpose of the standard library.
If it’s a language feature to be eventually stabilized to allow any crate to call as well as define Global Things, I feel there’s definitely enough design space that this needs an RFC. (Type checking and name collisions come to mind, but if anyone wants to discuss that please do it in a separate thread.)
@joshtriplett
Thoughts on this? (This is intentionally much simpler than the RFC for external existentials.)
That'd be insanely useful for embedded development!
If it’s a language feature to be eventually stabilized to allow any crate to call as well as define Global Things, I feel there’s definitely enough design space that this needs an RFC. (Type checking and name collisions come to mind, but if anyone wants to discuss that please do it in a separate thread.)
I don't think that's what they meant, probably only some syntax sugar to write #[global("allocator")]
instead of #[global_allocator]
for already existing attributes / hacks. There was already an RFC for such a feature last year which received considerable design work, but the lang team postponed it this year (RFC2492: Existential types with external definition). The #[alloc_error_handler]
is one of the many global "hooks" that would have been supported by such a feature.
@SimonSapin I feel like an rfcbot poll (along with a clear description of the incremental change to the existing behavior) would be enough to confirm consensus.
If it’s a language feature to be eventually stabilized to allow any crate to call as well as define Global Things,
Yes, that's what I was suggesting.
@gnzlbg I mentioned that RFC in my comment. I was suggesting something intentionally smaller and simpler in the hopes that it would be easier to do without touching needing existential types.
@SimonSapin I don't want to derail this issue. I think I'm just trying to see if that proposal sounds sufficiently useful and straightforward that it might be worth considering rather than adding case 2 of the 0-1-infinity rule. I was hoping that it sounded extremely simple to implement.
Oh yeah I’m not worried about the implementation, it’s the design that I think has some open questions.
Back to the idea of a default allocation error handler that panics: it becomes unnecessary (for the purpose of unblocking some use cases in the Stable channel) if we stabilize the #[alloc_error_handler]
attribute. I personally would be fine with that, since it’s very similar to the #[panic_implementation]
attribute which is already stable. How does @rust-lang/lang feel about this?
I was suggesting something intentionally smaller and simpler in the hopes that it would be easier to do without touching needing existential types.
You might want to open an internal thread with a proposal. GlobalAlloc
is a trait, alloc_error_handler
is a function, so a feature supporting both as independent "extern global thingies (traits and function types)" sounds at least as complicated as the "extern existential types" feature, which natively supports both because alloc_error_handler
implements a Fn
trait. I'm not sure what a "smaller" feature would look like, but the implementation challenges that it must solve are AFAICT the same, and if it supports both GlobalAlloc
and alloc_error_handler
, the amount of expressive power it adds to the language is the same, unless we were to artificially restrict the feature to only support some "blessed" existential types (e.g. GlobalAlloc
and alloc_error_handler
, but not user-defined types), but that's pretty much what we already have with the attribute system today.
#[global_allocator]
works with a trait, but really it’s sugar for four "global functions":
That's kind of what the "existential" in "extern existential" literally means. When one writes:
// crate importing existential
extern existential type Heap: GlobalAlloc;
let x = unsafe { Heap.alloc(...) };
Heap
has a single concrete type that implements the GlobalAlloc
trait, i.e., it is not generic. One of the many ways to implement this is to desugar that into the "four global functions" that you mention:
// crate importing existential
extern "Rust" {
static Heap: impl GlobalAlloc; // We only deal with references to this type
// These functions are not generic over `Self`, they only use `*mut c_void`
// because we don't know the actual type in this TU, but there is only one type:
unsafe fn Heap_GlobalAlloc_alloc(*const c_void, ...) -> ...;
unsafe fn Heap_GlobalAlloc_dealloc(*const c_void, ...);
... // this is generated from the trait definition, not the type definition
}
let x = unsafe { Heap_GlobalAlloc_alloc(&Heap as *const _ as _, ...); }
where some other crate must actually provide a definition of the type, e.g.,
// crate defining existential
struct JemallocHeap; impl GlobalAlloc for JemallocHeap { ... }
pub extern existential type Heap = JemallocHeap;
which then can be desugared to:
// crate defining existential
pub static Heap: JemallocHeap = JemallocHeap; // type is concrete
pub unsafe extern "Rust" fn Heap_GlobalAlloc_alloc(a: *const c_void, ...) -> ... {
GlobalAlloc::alloc(&*(a as *const JemallocHeap), ...) // uses concrete type!
}
pub unsafe extern "Rust" fn Heap_GlobalAlloc_dealloc(a: *const c_void, ...) { GlobalAlloc::alloc(&*(a as *const JemallocHeap), ...) }
...
Whether one applies this program transformation with a built in keyword, or using an attribute, doesn't matter much. There are cleverer ways to desugar the feature, e.g., see here, but these are pretty much stylistic changes. For the alloc_error_handler
, the compiler can just expand:
extern existential type alloc_error_handler: Fn(...) -> !;
to
extern "Rust" {
fn alloc_error_handler(...) -> !;
}
The tricky part is actually detecting collisions, but this is something that we already need to do for any flavor of this feature (e.g. if two crates implement a #[global_allocator]
we want to error at build time, instead of link time, or worse, have the linker pick one of the two allocators, e.g., depending on link order).
I feel like an rfcbot poll (along with a clear description of the incremental change to the existing behavior) would be enough to confirm consensus.
Alright, https://github.com/rust-lang/rust/issues/66741 is up.
I’ve also written up https://github.com/rust-lang/rust/issues/66740 to propose FCP to stabilize the attribute.
To some extent these two proposals are alternatives to each other in that either of them unlocks no_std
+ alloc
on Stable. But doing them both may still be desirable.
So, any progress for us alloc
+ no_std
users who want to compile with stable?
The message literally just before yours proposes two solutions, either of which would unblock that use case: https://github.com/rust-lang/rust/issues/66740, https://github.com/rust-lang/rust/issues/66741. However neither has consensus so far.
Are the right people aware of this issue? This is still the only blocker for using no_std
+ alloc
as far as I can tell and it seems to be stuck on a decision / organizational question.
This attribute is mandatory when using the
alloc
crate without thestd
crate. It is used like this:Implementation PR: https://github.com/rust-lang/rust/pull/52191
Blocking issues
Original issue:
In a
no_std
program or staticlib, linking to thealloc
crate may cause this error:This is fixed by providing the
oom
lang item, which is is normally provided by thestd
crate (where it calls a dynamically-settable hook https://github.com/rust-lang/rust/issues/51245, then aborts). This is called byalloc::alloc::handle_alloc_error
(which is called byVec
and others on memory allocation failure).However, defining a lang item is an unstable feature.
Possible solutions include:
Add and stabilize a dedicated attribute similar to the
#[panic_implementation]
attribute:The downside is that this is one more mandatory hoop to jump through for
no_std
program that would have been fine with a default hook that aborts.Movestd
’s dynamically-settable hook intoalloc
: https://github.com/rust-lang/rust/pull/51607. The downside is some mandatory space overhead.