Tracking issue for the #[alloc_error_handler] attribute (for no_std + liballoc)

SimonSapin commented 6 years ago

This attribute is mandatory when using the alloc crate without the std crate. It is used like this:

#[alloc_error_handler]
fn foo(_: core::alloc::Layout) -> ! {
    // …
}

Implementation PR: https://github.com/rust-lang/rust/pull/52191

Blocking issues

[ ] #83430
[ ] https://github.com/rust-lang/unsafe-code-guidelines/issues/506

Original issue:

In a no_std program or staticlib, linking to the alloc crate may cause this error:

error: language item required, but not found: `oom`

This is fixed by providing the oom lang item, which is is normally provided by the std crate (where it calls a dynamically-settable hook https://github.com/rust-lang/rust/issues/51245, then aborts). This is called by alloc::alloc::handle_alloc_error (which is called by Vec and others on memory allocation failure).

#![feature(lang_items)]

#[lang = "oom"]
extern fn foo(_: core::alloc::Layout) -> ! {
    // example implementation based on libc
    extern "C" { fn abort() -> !; }
    unsafe { abort() }
}

However, defining a lang item is an unstable feature.

Possible solutions include:

Add and stabilize a dedicated attribute similar to the #[panic_implementation] attribute:
```
#[alloc_error_handler]
fn foo(_: core::alloc::Layout) -> ! {
  // …
}
```
The downside is that this is one more mandatory hoop to jump through for no_std program that would have been fine with a default hook that aborts.
~~Move std’s dynamically-settable hook into alloc: https://github.com/rust-lang/rust/pull/51607. The downside is some mandatory space overhead.~~

NickeZ commented 4 years ago

I'm also a bit stuck due to this not being stabilized. I'm surprised there aren't more no_std people bringing this up. Maybe everyone uses nightly for other reasons anyway.

therealprof commented 4 years ago

@NickeZ It's been brought up every now and then, e.g. just now in our Embedded WG meeting. ;)

Lokathor commented 4 years ago

It seems like something simple that should be able to stabilize practically "today"?

SimonSapin commented 4 years ago

@rust-lang/lang @rust-lang/libs How would you feel about making #[alloc_error_handler] default to panic? That is, when libstd (which defines a #[alloc_error_handler]) is not linked and no other crate defines a handler, do as if this was defined:

#[alloc_error_handler]
fn default(layout: core::alloc::Layout) -> ! {
    panic!("memory allocation of {} bytes failed", layout.size());
}

(This panic message matches what libstd’s default allocation error hook (https://github.com/rust-lang/rust/issues/51245) prints before aborting the process.)

This would unblock no_std + alloc application from running on the stable channel, without stabilizing this attribute.

Panicking is different from libstd’s behavior of aborting the process, but in a no_std context there may not even be a process to abort. It seems to me that panicking is a reasonable default in that case. #[panic_handler] gets called, which is stable and quite similar to #[alloc_error_handler] as it exists as unstable today.

sfackler commented 4 years ago

It seems strange to be aborting with std and panicking without. Should we just start panicking in std as well? IIRC that's a change people feel can be made?

Lokathor commented 4 years ago

As I understand it:

The reason to make OOM jump to an abort instead of doing a panic is because the panic could unwind (depending on build settings, which as far as I know aren't exposed in the program via cfg flag or anything like that).
That unwind could allocate
Since you just failed to allocate, trying to allocate right now is maybe a bad plan. Except that:
- Maybe you tried to make a huge allocation (like asking for isize::max_value() on accident) and it's totally actually okay to make a small allocation.
- In this case your unwind happens and the thread (but maybe not the whole program) dies.
- On the other hand maybe you're really absolutely out of memory and you can't allocate even a single boxed error, in which case you'd get an OOM panic when the unwind tries to allocate.
- Except you're already in panic mode, and a panic during a panic causes an abort (bringing down the entire program).

So if libstd were to panic instead of immediately abort, and assuming that panic="unwind" for that build, the process as a whole might be able to survive an OOM in cases where it previously did not.

Lokathor commented 4 years ago

PS: From a language stability perspective, it seems entirely future-compatible for the story to be: "for now, you can't define the OOM handler. You always get this default handler in no_std which will just panic. In the future you'll have the option to define your own handler, or you could of course continue to allow the default handler to be used".

mark-i-m commented 4 years ago

I like @SimonSapin's proposal. In my usecases, an OOM can always be handled by invoking something directly in alloc, rather than returning null, so they are functionally equivalent. Having alloc + no_std on stable would be fantastic.

glaebhoerl commented 4 years ago

@Lokathor IIRC the main issue was not just that, but the possibility of there being unsafe code which is sound in the presence of abort-on-OOM but unsound given panic-on-OOM. That is, unsafe code authors heretofore did not have to consider panic safety every time they called a potentially-allocating function, and now they would (but the code is already written).

[I didn't and don't have any opinions on this and don't want to relitigate it, just as a FYI.]

SimonSapin commented 4 years ago

@sfackler Yeah it’s inconsistent. But maybe acceptable because there may not be a process to abort anyway on some embedded targets, and this issue tracks a way to eventually customize the behavior. Panicking is just the default.

FWIW we have an accepted RFC (but unimplemented, and without a precise mechanism specified) to allow opting into panic instead of abort (in libstd) on allocation errors: https://github.com/rust-lang/rust/issues/43596

Changing the default in libstd seems potentially more risky to me, as @glaebhoerl mentions.

gnzlbg commented 4 years ago

How would you feel about making #[alloc_error_handler] default to panic?

Since panic! is allowed to allocate memory (and does so for -C panic=unwind), what happens if that fails within the #[alloc_error_handler] ? Does it get called again from within the panic! in an infinite recursion? (or does it have a way to detect whether such recursion is happening?).

Lokathor commented 4 years ago

I covered this above (https://github.com/rust-lang/rust/issues/51540#issuecomment-553211363), a panic during a panic goes into an abort.

gnzlbg commented 4 years ago

For the panic during a panic case to trigger, the first panic must "succeed", but that doesn't happen if the panic fails to allocate memory, right?

EDIT: e.g. if the panic fails to allocate, then the stack is not unwound, yet we call the #[alloc_error_handler] which will try to panic again, which won't detect a double-panic because we are not unwinding the stack yet. I suppose we can add extra logic to panic! to handle #[alloc_error_handler] recursion, maybe by shuffling around the double-panic detection code? (I don't recall if we just call libunwind and ask "are we panicking" or keep extra state around for that already).

SimonSapin commented 4 years ago

std::panic is not a reexport of core::panic, they’re two different macros. The latter does not allocate.

Amanieu commented 4 years ago

It could panic while trying to allocate the string for format!, which happens before calling the panic machinery. Therefore you need explicit detection for recursion in the alloc handler.

EDIT: Ah nevermind, I just checked the code and the formatting happens after checking for a double panic, so everything should be fine.

gnzlbg commented 4 years ago

@SimonSapin Maybe I misunderstood you? I thought you were proposing for #[alloc_error_handler] to also panic for std builds as well, where std::panic is used, and std::panic tries to allocate. For core::panic, it might or might not allocate, depending on how the user implements the panic handler, but today they can allocate if they have access to liballoc.

SimonSapin commented 4 years ago

My initial proposal in https://github.com/rust-lang/rust/issues/51540#issuecomment-553114657 was to default to panic only when no other alloc_error_handler is provided. Since libstd provides such a handler, this new default would only be used when libstd is not linked.

By necessity, this default would use core::panic rather than std::panic. The former calls #[panic_handler] without allocating.

https://github.com/rust-lang/rust/blob/374ad1b0063963060a00a3110e44d76e7105d059/src/libcore/macros.rs#L12-L28

https://github.com/rust-lang/rust/blob/374ad1b0063963060a00a3110e44d76e7105d059/src/libcore/panicking.rs#L70-L83

Then, if a user-provided #[panic_handler] tries to allocate and that fails, yes we’d get a double panic. But that’s not the only case where a user-provided #[panic_handler] can double-panic.

In a later comment https://github.com/rust-lang/rust/issues/51540#issuecomment-553207208, @sfackler proposed to also change libstd’s default to panic, for consistency.

gnzlbg commented 4 years ago

My initial proposal in #51540 (comment) was to default to panic only when no other alloc_error_handler is provided.

Ah, that makes sense.

In a later comment #51540 (comment), @sfackler proposed to also change libstd’s default to panic, for consistency.

I was expressing concern about this, but as @Amanieu mentioned, this currently allocates after checking for double panics, so this isn't an issue.

gnzlbg commented 4 years ago

IIRC the main issue was not just that, but the possibility of there being unsafe code which is sound in the presence of abort-on-OOM but unsound given panic-on-OOM. That is, unsafe code authors heretofore did not have to consider panic safety every time they called a potentially-allocating function, and now they would (but the code is already written).

Example:

// Invariant: ptr always points to some allocated memory of len > 0
struct MyDynArray { ptr: *mut u8, len: usize } 
impl MyDynArray {
    pub fn new() { Self { ptr: GlobalAlloc(..1..), len: 1 } }
    pub fn resize(&mut self, val: u8, new_len: usize) { unsafe {
        // this unsafe code is sound today, but not if allocation can panic
        let new_len = if new_len == 0 { 1 } else { new_len };
        let ptr = self.ptr;
        self.ptr = ptr::null(); 
        let new_ptr = GlobalAlloc::alloc(...new_len...); 
        // move elements from [ptr, ptr + size] to  [new_ptr, new_ptr + size]
        GlobalAlloc::dealloc(ptr);
        self.ptr = new_ptr;
        self.len = new_len;
    }}
}
// safe because this will never see a `self.ptr == ptr::null()`
impl Drop for MyDynArray { fn drop(&mut self) { unsafe { GlobalAlloc::dealloc(self.ptr) } }}

This code is sound because either the call to GlobalAlloc::alloc(...new_len...) succeeds or it aborts, which means that Drop::drop will never see a MyDynArray with self.ptr == ptr::null().

If GlobalAlloc::alloc is changed to unwind on OOM, then such code now has a memory safety issue. While the code can be "fixed" to avoid that, such code might already be written, and we can't easily tell.

Basically, that GlobalAlloc::alloc does not unwind on OOM is a guarantee of its API that we can't break. We could make #[alloc_error_handler] panic by default, and add a catch_unwind to GlobalAlloc::alloc to make it abort. For Alloc::alloc, we could just change its semantics since they are unstable, and maybe we can add a different "system allocator" hook with "panics on OOM" semantics that people can migrate to, and deprecate the old one.

SimonSapin commented 4 years ago

Is unwinding possible when libstd is not linked?

Amanieu commented 4 years ago

Is unwinding possible when libstd is not linked?

Yes, if you use nightly, port libpanic_unwind and libunwind to your platform, and invoke them directly without going through libstd's panic framework. Obviously this relies on very unstable implementation details.

Lokathor commented 4 years ago

@gnzlbg your example (1) could easily be fixed in the presence of OOM leading to unwind (2) doesn't even have correct usage for the api that GlobalAlloc exposes (3) doesn't call alloc::alloc::handle_alloc_error, which is how you'd go to the allocation handler that would either abort or unwind.

To be clear: your code example is not sound today.

mark-i-m commented 4 years ago

If a crate is designed to work with no_std, wouldn't there authors have to consider the possibility that abort does not work? Am I misunderstanding

Lokathor commented 4 years ago

Today, a person can just write a no_std lib, and you call the lib and it parses bytes or does a formula or whatever. Many people use no_std in its literal meaning: without the OS backed standard library. They still assume that there's something out there beyond the process. It's unfortunate, but that's the social contract we've developed.

I guess your panic_handler should trap the thread in a loop or something if it's really unable to exit to a wider OS. That's what I do for GBA. (as a reminder: not an empty loop because those are UB because of LLVM bug, do a volatile read something over and over so that there's a side effect)

therealprof commented 4 years ago

(as a reminder: not an empty loop because those are UB because of LLVM bug, do a volatile read something over and over so that there's a side effect)

loop { continue; } works just fine. 😅

eddyb commented 4 years ago

Surely loop { continue; } would compile to the same LLVM IR as loop {} (even the same MIR)?

"Works just fine" is not applicable to UB, almost any UB "works just fine" some of the time, but the risk here is lurking miscompilations.

therealprof commented 4 years ago

Surely loop { continue; } would compile to the same LLVM IR as loop {} (even the same MIR)?

No idea. Funny enough I can't reproduce the problem anymore but using loop { continue; } has never failed me on thumbv6m-none-eabi (and higher) while loop {} often has.

mark-i-m commented 4 years ago

It seems like a bug if loop {} and loop { continue; } produce different code...

SimonSapin commented 4 years ago

This is a discussion for https://github.com/rust-lang/rust/issues/28728, please keep this thread about alloc_error_handler.

gnzlbg commented 4 years ago

@gnzlbg your example (2) doesn't even have correct usage for the api that GlobalAlloc exposes (3) doesn't call alloc::alloc::handle_alloc_error, which is how you'd go to the allocation handler that would either abort or unwind.

To be clear: your code example is not sound today.

I disagree. This is the example with the blanks expanded (playground) and I believe it is correct and sound today. Changing handle_alloc_error to panic instead, would make this program unsound, and would make it memory unsafe.

Please point out explicitly what you believe to be unsound in this use of the GlobalAlloc API. AFAICT, the code only relies on guarantees that the API provides.

(1) could easily be fixed in the presence of OOM leading to unwind

Do you have a way to automatically fix or migrate all of existing code that might run into the family of issues that the example above demonstrates ?

Lokathor commented 4 years ago

well sure it works if you fill in all the checks you skipped before XD

Lokathor commented 4 years ago

also, just as a reminder: we already have an accepted RFC to make oom=panic a Cargo config option: https://github.com/rust-lang/rust/issues/43596

SimonSapin commented 4 years ago

@gnzlbg This example is only sound on the assumption that handle_alloc_error never unwinds. But I don’t think this assumption is valid. https://doc.rust-lang.org/1.39.0/std/alloc/fn.handle_alloc_error.html documents:

The default behavior of this function is to print a message to standard error and abort the process. It can be replaced with set_alloc_error_hook and take_alloc_error_hook.

(Emphasis mine.)

Amanieu commented 4 years ago

I feel that we can probably get away with making OOM unwind by claiming that any unsafe code made unsound by this change was always broken.

I personally feel that unwinding on OOM (even on std) is the right approach: 99% of OOMs are caused by excessively large allocations rather than actually running out of memory. Even if unwinding requires allocating memory, this will usually work fine. A double panic and abort will handle the (rare) recursive OOM case.

gnzlbg commented 4 years ago

I feel that we can probably get away with making OOM unwind by claiming that any unsafe code made unsound by this change was always broken.

I agree.

xobs commented 4 years ago

If I'm reading the thread correctly, the current proposal to get alloc + no_std working on stable is to provide a default alloc_error_handler that simply calls panic_handler, which will allow people to use alloc in a no_std environment on stable for now while the #[alloc_error_handler] discussion continues.

The panic handler that is implemented in #[panic_handler] should take care not to make any allocations to avoid infinite panic recursions.

This seems to work for my usecases, which involve embedded systems running without an OS (or processes that are running with an OS that isn't supported by std), where we can do everything from writing an LED to indicate failure, to sending a message to the operating system to terminate the process with an "OOM" error.

What is needed to reach a consensus to stabilize this?

joshtriplett commented 4 years ago

A few thoughts on moving this forward:

It sounds like we're close to a consensus on a default allocation error handler that calls the panic handler. Can we confirm that consensus?
Once we have that consensus, can we get someone specific to volunteer to write a PR implementing that change?
Once we make that change, I don't think we want to immediately move to stabilize; we should give people time to try the new behavior first.

Also, one other thought: any interest in a very slightly more general mechanism that would avoid further language work for future things like this, along these lines? (Naming intentionally WIP.)

#[global_thing("alloc_error")]
fn alloc_error_panic(...) -> ! { ... }

// Elsewhere
#[global_thing_get("alloc_error")]
fn alloc_error(...) -> !;

... alloc_error(...) ...

This would give a compile time error if any code using a given global thing gets compiled in and that global thing doesn't get set exactly once.

Thoughts on this? (This is intentionally much simpler than the RFC for external existentials.)

SimonSapin commented 4 years ago

Can we confirm that consensus?

The formal way to do this is an RFC for the Lang and Libs team. Or do you feel an FCP (in a dedicated issue) would be enough?

I don't think we want to immediately move to stabilize

This is an interesting case for unstable experimentation because it is about a default behavior when nothing other behavior is opted-in, and affects not just one crate but an executable/staticlib/cdylib as a whole.

Maybe we could have that "default" only apply if any of the crates in the dependency graph has a given #![feature(something)] opt-in?

a very slightly more general mechanism that would avoid further language work

#[alloc_error_handler] and #[panic_handler] are very similar to each other as far as the language is concerned. I feel that the additional language work for another one like these and the number of them we’re likely to ever need are low enough that it’s not worth a general mechanism for the purpose of the standard library.

If it’s a language feature to be eventually stabilized to allow any crate to call as well as define Global Things, I feel there’s definitely enough design space that this needs an RFC. (Type checking and name collisions come to mind, but if anyone wants to discuss that please do it in a separate thread.)

therealprof commented 4 years ago

@joshtriplett

Thoughts on this? (This is intentionally much simpler than the RFC for external existentials.)

That'd be insanely useful for embedded development!

gnzlbg commented 4 years ago

If it’s a language feature to be eventually stabilized to allow any crate to call as well as define Global Things, I feel there’s definitely enough design space that this needs an RFC. (Type checking and name collisions come to mind, but if anyone wants to discuss that please do it in a separate thread.)

I don't think that's what they meant, probably only some syntax sugar to write #[global("allocator")] instead of #[global_allocator] for already existing attributes / hacks. There was already an RFC for such a feature last year which received considerable design work, but the lang team postponed it this year (RFC2492: Existential types with external definition). The #[alloc_error_handler] is one of the many global "hooks" that would have been supported by such a feature.

joshtriplett commented 4 years ago

@SimonSapin I feel like an rfcbot poll (along with a clear description of the incremental change to the existing behavior) would be enough to confirm consensus.

If it’s a language feature to be eventually stabilized to allow any crate to call as well as define Global Things,

Yes, that's what I was suggesting.

@gnzlbg I mentioned that RFC in my comment. I was suggesting something intentionally smaller and simpler in the hopes that it would be easier to do without touching needing existential types.

@SimonSapin I don't want to derail this issue. I think I'm just trying to see if that proposal sounds sufficiently useful and straightforward that it might be worth considering rather than adding case 2 of the 0-1-infinity rule. I was hoping that it sounded extremely simple to implement.

SimonSapin commented 4 years ago

Oh yeah I’m not worried about the implementation, it’s the design that I think has some open questions.

SimonSapin commented 4 years ago

Back to the idea of a default allocation error handler that panics: it becomes unnecessary (for the purpose of unblocking some use cases in the Stable channel) if we stabilize the #[alloc_error_handler] attribute. I personally would be fine with that, since it’s very similar to the #[panic_implementation] attribute which is already stable. How does @rust-lang/lang feel about this?

gnzlbg commented 4 years ago

I was suggesting something intentionally smaller and simpler in the hopes that it would be easier to do without touching needing existential types.

You might want to open an internal thread with a proposal. GlobalAlloc is a trait, alloc_error_handler is a function, so a feature supporting both as independent "extern global thingies (traits and function types)" sounds at least as complicated as the "extern existential types" feature, which natively supports both because alloc_error_handler implements a Fn trait. I'm not sure what a "smaller" feature would look like, but the implementation challenges that it must solve are AFAICT the same, and if it supports both GlobalAlloc and alloc_error_handler, the amount of expressive power it adds to the language is the same, unless we were to artificially restrict the feature to only support some "blessed" existential types (e.g. GlobalAlloc and alloc_error_handler, but not user-defined types), but that's pretty much what we already have with the attribute system today.

SimonSapin commented 4 years ago

#[global_allocator] works with a trait, but really it’s sugar for four "global functions":

https://github.com/rust-lang/rust/blob/b9cf5417892ef242c783ef963deff5436205b0f6/src/libsyntax_ext/global_allocator.rs#L76-L86

gnzlbg commented 4 years ago

That's kind of what the "existential" in "extern existential" literally means. When one writes:

// crate importing existential
extern existential type Heap: GlobalAlloc;
let x = unsafe { Heap.alloc(...) };

Heap has a single concrete type that implements the GlobalAlloc trait, i.e., it is not generic. One of the many ways to implement this is to desugar that into the "four global functions" that you mention:

// crate importing existential
extern "Rust" {
      static Heap: impl GlobalAlloc; // We only deal with references to this type
      // These functions are not generic over `Self`, they only use `*mut c_void`
      // because we don't know the actual type in this TU, but there is only one type:
      unsafe fn Heap_GlobalAlloc_alloc(*const c_void, ...) -> ...;
      unsafe fn Heap_GlobalAlloc_dealloc(*const c_void, ...);
      ... // this is generated from the trait definition, not the type definition
}
let x = unsafe { Heap_GlobalAlloc_alloc(&Heap as *const _ as _, ...); }

where some other crate must actually provide a definition of the type, e.g.,

// crate defining existential
struct JemallocHeap; impl GlobalAlloc for JemallocHeap { ... }
pub extern existential type Heap = JemallocHeap;

which then can be desugared to:

// crate defining existential
pub static Heap: JemallocHeap = JemallocHeap; // type is concrete
pub unsafe extern "Rust" fn Heap_GlobalAlloc_alloc(a: *const c_void, ...) -> ... { 
    GlobalAlloc::alloc(&*(a as *const JemallocHeap), ...) // uses concrete type!
}
pub unsafe extern "Rust" fn Heap_GlobalAlloc_dealloc(a: *const c_void, ...) { GlobalAlloc::alloc(&*(a as *const JemallocHeap), ...) }
...

Whether one applies this program transformation with a built in keyword, or using an attribute, doesn't matter much. There are cleverer ways to desugar the feature, e.g., see here, but these are pretty much stylistic changes. For the alloc_error_handler, the compiler can just expand:

extern existential type alloc_error_handler: Fn(...) -> !;

to

extern "Rust" {
     fn alloc_error_handler(...) -> !;
}

The tricky part is actually detecting collisions, but this is something that we already need to do for any flavor of this feature (e.g. if two crates implement a #[global_allocator] we want to error at build time, instead of link time, or worse, have the linker pick one of the two allocators, e.g., depending on link order).

SimonSapin commented 4 years ago

I feel like an rfcbot poll (along with a clear description of the incremental change to the existing behavior) would be enough to confirm consensus.

Alright, https://github.com/rust-lang/rust/issues/66741 is up.

I’ve also written up https://github.com/rust-lang/rust/issues/66740 to propose FCP to stabilize the attribute.

To some extent these two proposals are alternatives to each other in that either of them unlocks no_std + alloc on Stable. But doing them both may still be desirable.

haraldh commented 4 years ago

So, any progress for us alloc + no_std users who want to compile with stable?

SimonSapin commented 4 years ago

The message literally just before yours proposes two solutions, either of which would unblock that use case: https://github.com/rust-lang/rust/issues/66740, https://github.com/rust-lang/rust/issues/66741. However neither has consensus so far.

NickeZ commented 4 years ago

Are the right people aware of this issue? This is still the only blocker for using no_std + alloc as far as I can tell and it seems to be stuck on a decision / organizational question.

rust-lang / rust

Tracking issue for the #[alloc_error_handler] attribute (for no_std + liballoc) #51540

Blocking issues

Original issue: