rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
97.7k stars 12.64k forks source link

Tracking issue for the #[alloc_error_handler] attribute (for no_std + liballoc) #51540

Open SimonSapin opened 6 years ago

SimonSapin commented 6 years ago

This attribute is mandatory when using the alloc crate without the std crate. It is used like this:

#[alloc_error_handler]
fn foo(_: core::alloc::Layout) -> ! {
    // …
}

Implementation PR: https://github.com/rust-lang/rust/pull/52191

Blocking issues

Original issue:


In a no_std program or staticlib, linking to the alloc crate may cause this error:

error: language item required, but not found: `oom`

This is fixed by providing the oom lang item, which is is normally provided by the std crate (where it calls a dynamically-settable hook https://github.com/rust-lang/rust/issues/51245, then aborts). This is called by alloc::alloc::handle_alloc_error (which is called by Vec and others on memory allocation failure).

#![feature(lang_items)]

#[lang = "oom"]
extern fn foo(_: core::alloc::Layout) -> ! {
    // example implementation based on libc
    extern "C" { fn abort() -> !; }
    unsafe { abort() }
}

However, defining a lang item is an unstable feature.

Possible solutions include:

  1. Add and stabilize a dedicated attribute similar to the #[panic_implementation] attribute:

    #[alloc_error_handler]
    fn foo(_: core::alloc::Layout) -> ! {
      // …
    }

    The downside is that this is one more mandatory hoop to jump through for no_std program that would have been fine with a default hook that aborts.

  2. Move std’s dynamically-settable hook into alloc: https://github.com/rust-lang/rust/pull/51607. The downside is some mandatory space overhead.

SimonSapin commented 6 years ago

Better idea: remove the lang item and move all of OOM handling to liballoc, except for the default hook that prints to stderr. Have compiler magic similar to that of #[global_allocator] that uses the printing default hook when std is linked, or a no-op default hook otherwise.

glandium commented 6 years ago

Bikeshedding on oom renaming: alloc_error would be shorter.

SimonSapin commented 6 years ago

I forgot to link https://github.com/rust-lang/rust/pull/51543, which does indeed use alloc_error.

SimonSapin commented 6 years ago

@japaric, @glandium, @Amanieu I’ve updated the the issue description here with arguments from https://github.com/rust-lang/rust/pull/51607 and https://github.com/rust-lang/rfcs/pull/2480 and to list two alternatives.

japaric commented 6 years ago

Replying to @glandium's https://github.com/rust-lang/rfcs/pull/2480#issuecomment-401243511 here to not derail the alloc RFC.

Every #[no_std] user of the alloc crate would have to implement their own

Not everyone has to implement their own. An implementation can be packed into a crate and it just takes adding a extern crate oom_abort; to register an OOM handler -- this is no different than what we have for global allocators (e.g. extern crate alloc_jemalloc).

and they can't use instrinsics::abort, so it becomes harder than necessary.

Then stabilize intrinsics::abort or provide a stable oom_abort crate with the rust-std component. Though it's better to stabilize intrinsics::abort because then we can use it for #[panic_implementation] too.

The problem is that with a #[oom] attribute, you don't get a default handler at all.

No default OOM handler is good. There's no sensible default for every single combination of no_std program and target anyways. For example, intrinsics::abort produces a linker error on MSP430 because there's no abort instruction in the MSP430 instruction set.

Also, not having a default is consistent with both #[panic_implementation] and #[global_allocator] in #[no_std] context. Why special case the OOM handler?


Another reason why I like static registration of global properties is that it's less error prone. Say you want the OOM (or panic) handler to behave differently between development and final release. With #[oom] you can write this:

#[cfg(debug_assertions)]
extern crate oom_report; // for development, verbose

#[cfg(debug_assertions)]
extern crate oom_panic; // for release, minimal size

This is clearly wrong, you get a compiler error. With the set_alloc_error_hook you don't get a compiler error; you won't notice the problem until you hit an OOM in dev and lost your opportunity to track down the root of the OOM.

The other reason I like #[oom] / #[panic_implementation] is that you can be sure of the behavior of the OOM / panic if you register the handler in the top crate.

extern crate oom_abort; // I'm 100% sure OOM = abort

With the hook API you have no guarantee

fn main() {
    set_alloc_error_hook(|| abort()); // OOM = abort, maybe

    dependency::function(); // this is totally free to change the OOM handler

    // ..
}

Finally, if you need the ability to override the OOM handler at runtime using hooks you can implement that on top of #[oom].

SimonSapin commented 6 years ago

@japaric I find this convincing regarding static v.s. dynamic, thanks.

For example, intrinsics::abort produces a linker error on MSP430 because there's no abort instruction in the MSP430 instruction set.

Oh, I was wondering if something like that ever happened.

How does one typically deal with unrecoverable situations on MSP430? An infinite loop?

No default OOM handler is good.

I think we can have a default, with a Sufficiently Advanced Compiler. (Grep for has_global_allocator for code that does something similar.)

But you’re saying we should not, and force that question on ever no_std user. This would be one more barrier before being able to get to Hello World.

SimonSapin commented 6 years ago

https://github.com/rust-lang/rust/pull/51607#issuecomment-401893293

FWIW an attribute like #[oom] I also think would be a great idea (and I think I'm also convinced that it may be worth it over this dynamic strategy), and it could be implemented pretty similar to #[panic_implementation]

Right. I think the remaining question is: should there be a default? If not, what should be a "typical" implementation that we’d recommend in docs? Should we add a stable wrapper for core::intrinsics::abort? (In what module?)

The docs for core::intrinsics::abort claim:

The stabilized version of this intrinsic is std::process::abort

But besides availability, the two functions do not have equivalent behavior: https://github.com/servo/servo/pull/16899.

japaric commented 6 years ago

How does one typically deal with unrecoverable situations on MSP430? An infinite loop?

@pftbest and @cr1901 would be more qualified to answer that question. I'm not a MSP430 developer myself.

On Cortex-M the abort instruction triggers the HardFault handler. While developing I configure panic and HardFault to log info about the program, or just to trigger a breakpoint if binary size is a problem. In release mode, I usually set panic to abort and make the HardFault handler disable interrupts, shut down the system (e.g. turn off motors), signal the failure (e.g. turn on a red LED) and go into an infinite loop but this is very application specific. Also, in some applications reaching an unrecoverable situation means that something is (very) wrong with your software and that it should not be deployed until it's fixed.

But you’re saying we should not, and force that question on ever no_std user. This would be one more barrier before being able to get to Hello World.

[global_allocator] is not mandatory for #[no_std] binaries. And the oom lang item is only required when you are using #[global_allocator]. So, no, not all no_std program developers have to deal with the oom handler.


should there be a default?

I think there should not be a default.

If not, what should be a "typical" implementation that we’d recommend in docs?

The #![no_std] application - target space is too broad to recommend anything that will behave the same everywhere. Consider intrinsics::abort:

Should we add a stable wrapper for core::intrinsics::abort? (In what module?)

Yes. As core::arch::$ARCH::abort maybe? But only on architectures whose instruction sets define an abort / trap instruction.

The docs for core::intrinsics::abort claim:

Those docs should be fixed. iirc process::abort tries to does some clean up (of the Rust runtime?) before aborting the process. intrinsics::abort is an LLVM intrinsic that maps to the abort instruction of the target instruction set so the two are not equivalent.

cr1901 commented 6 years ago

How does one typically deal with unrecoverable situations on MSP430? An infinite loop?

Infinite loop is how I typically deal w/ it. @pftbest can override me if he knows something better though, as it's been a while since I delved into msp430 internals :).

SimonSapin commented 6 years ago

https://github.com/rust-lang/rust/pull/52191 has landed with an attribute for a statically-dispatched function, and no default.

mark-i-m commented 6 years ago
mm/krust/libkrust.a(alloc-062fb091d60c735a.alloc.67kypoku-cgu.11.rcgu.o): In function `alloc::alloc::handle_alloc_error':
alloc.67kypoku-cgu.11:(.text._ZN5alloc5alloc18handle_alloc_error17h59bd4dd5f11cdd3fE+0x2): undefined reference to `rust_oom'

I'm getting the above in one my projects. I have defined the handle_alloc_error function as in the OP. The code compiles just fine, but the linker cannot find the function.

mark-i-m commented 6 years ago

However, it seems to work if I add extern to the function definition.

SimonSapin commented 6 years ago

@mark-i-m I assume you mean a #[alloc_error_handler] function, in a #![no_std] crate. Could you provide a set of steps/files to reproduce the issue?

mark-i-m commented 6 years ago

@SimonSapin Sorry, yes, that's what I meant. Let me try to distill my code down to a minimal example.

mark-i-m commented 6 years ago

Hmm... I'm having a very hard time reproducing this minimally. The build environment I am in is rather convoluted and involves linking again some compiled C code.

hadronized commented 6 years ago

Hello. I have a similar issue there.

Code here.

SimonSapin commented 6 years ago

I don’t know if unwinding without std is supported at all. Consider either adding std to your dependency graph or compiling with panic = "abort".

ghost commented 5 years ago

What's the status of stabilization of this feature? We're using alloc in developing a kernel, and it's currently one of the few features keeping us on nightly.

Centril commented 5 years ago

What is the library team's involvement here? #[alloc_error_handler] seems like a pure T-Lang matter as it, as far as I understand, does not involve any changes to the standard library's public API. The function set_alloc_error_hook seems like a different matter but this issue just tracks the attribute...

SimonSapin commented 5 years ago

I disagree. There’s an ad-hoc attribute because we don’t have a more general mechanism for “call a function based on its declaration without depending on a crate that contains its definition, and require exactly one definition somewhere in the crate graph, whose signature is type-checked.” There’s precedent for this kind of attribute with #[global_allocator], and this one is no different as far as the language is concerned.

But the role of this handler and the handle_alloc_error function that it supports are entirely a library matters.

Centril commented 5 years ago

Ad-hoc or not... anything and everything that cannot be done in the stable language and needs compiler support is a T-Lang matter (unless it is about implementation defined behavior or optimizations within the confines of the specification, in which case it's a T-Compiler matter). This includes attributes, intrinsics, and lang items. When a general mechanism for libraries is provided it becomes a T-libs matter. In this case you are actually performing custom type checking involving both attributes and lang items.

Yes, #[global_allocator] and in particular https://github.com/rust-lang/rfcs/pull/2325 happened. In my view those are clearly examples of where the language team should have been involved but was not. They are "precedents" I don't want to repeat.

ghost commented 5 years ago

So just to circle back, what needs to happen to stabilize #![feature(alloc_error_handler)]?

Ericson2314 commented 5 years ago

Maybe let's not stabilize it? And do https://github.com/rust-lang/rfcs/pull/2492 instead. (Oh i just noticed that's nominated, yay!)

gnzlbg commented 5 years ago

Can someone explain how this API fits with GlobalAlloc ?

AFAICT, it is not part of the GlobalAlloc trait. Allocators can return ptr::null_mut() if an allocation cannot be satisfied, and the GlobalAlloc::alloc documentation calls out that this isn't necessary due to OOM (https://doc.rust-lang.org/1.29.1/std/alloc/trait.GlobalAlloc.html#errors).

In C, errno can be used to query whether malloc errored, and why. In Rust there doesn't seem to be an alternative.

So I would understand that allocators could provide this API, to allow users to query the type of error, or to abort if the type of error was OOM.

Yet this API appears to be independent from the allocator, and any code can override it. So I'm confused about what the purpose of this API is, and how can an user implementing it for any GlobalAlloc be even be able to tell, whether the allocator ran out of memory or not.

gnzlbg commented 5 years ago

Maybe this API has nothing to do with OOM, or error handling in general, and it is supposed to just be called by code that wants to terminate the process if any allocation error happens ?

If so I find the name a bit misleading, and I still don't know how can an implementer of this API be able to to anything better than printing "An allocation error happened for this Layout", without any specific knowledge of the allocator being used.

For example, on Linux, one could provide a better and more standard error message by calling explain_malloc(size) (https://linux.die.net/man/3/explain_malloc), but for handle_alloc_error to be able to call that, it would need to know that the allocator being used is the system's malloc, and not, e.g., jemalloc.

SimonSapin commented 5 years ago

@gnzlbg I’m having a hard time understanding the first of your last two messages. This attribute has nothing to do with querying information from the allocator.

When an allocation fails (which in GlobalAlloc::alloc is represented by returning null), APIs like Vec::push that don’t want to propagate that failure to their caller can instead call std::alloc::handle_alloc_error(layout: Layout) -> !.

When libstd is linked, handle_alloc_error prints a message to stderr then aborts the process. In a #![no_std] environment however, we can’t assume there is such a thing as a process or a standard output. Therefore we require the program to provide (through this attribute) a function that does not return, which handle_alloc_error will call.

This is similar to the #[panic_handler] attribute, which provides a function that panic!() (eventually) calls.

Maybe […] it is supposed to just be called by code that wants to terminate the process if any allocation error happens ?

Yes.

If so I find the name a bit misleading,

Although the attribute is unstable and we can rename it, it relates to the name of std::alloc::handle_alloc_error which is stable.

and I still don't know how can an implementer of this API be able to to anything better than printing "An allocation error happened for this Layout", without any specific knowledge of the allocator being used.

That’s exactly what libstd does. This attribute is all about replacing that when libstd is not used.


TIL about explain_malloc. It looks like it’s not in libc but part of a separate library. Does it only work with glibc?

Regarding information about why an allocation failed, maybe we can add APIs for that (maybe through https://github.com/rust-lang/wg-allocators/issues/23) but the #[alloc_error_handler] attribute is unrelated.

gnzlbg commented 5 years ago

Thanks for the explanation.

I'm still trying to fill in the blanks about how std::alloc::handle_alloc_error is supposed to print the allocation error message without knowing anything about the allocator. As in, if we were to go from Box<T>/Vec<T> to Box<T, A>/Vec<T, A>, how is handle_alloc_error supposed to know which A was used for the allocation ?

Or is the intent to only use this API for the global allocator and introduce a different API for other allocators ?


It looks like it’s not in libc but part of a separate library. Does it only work with glibc?

The system one works with the system allocator, but each memory allocator has its own APIs for this.

So for example, with jemalloc, you probably want to override malloc_message, which jemalloc will use to dump the reason for an error to a stream, and on error, you probably want to open this stream, and print its contents to the user. With TCMalloc you might / might not want to print the status of the heap on error.

Other allocators would have their own ways to communicate this information. E.g. APIs like posix_memalign return an error code that one can then use to print an error string (e.g. alignment is not a power of two; alignment is not a multiple of sizeof(void*), alignment request too high, etc.). On error, a global_allocator wrapping that in Rust can either panic, or return a null pointer. But when Vec gets the error and wants to print it, it would be nice for it to be able to do so. AFAICT the allocator only communicates with Vec via a raw pointer, so the allocator would need to store the error in some local context, e.g., errno-style, and handle_alloc_error would need to query that. But for doing so, it needs to have some kind of API to the allocator that was used for the failed allocation.

SimonSapin commented 5 years ago

The message that libstd prints is "memory allocation of {} bytes failed", layout.size() regardless of the allocator type. It doesn’t print more information than that. I am not aware of any plan or request to make it print more than that (before yours today, if we take this as a feature request).

gnzlbg commented 5 years ago

I think I misunderstood what the API was for. I thought this was an API for printing why, as in, the reason, an allocation failed.

Instead, it appears to be for printing within libstd an error message saying that an allocation attempt for some layout failed, and that's it. Since the API is public and stable, downstream crates can use it to print the same error message that libstd does.

What I'm not sure I understand is why does the implementation of the API need to be a lang item. Can't the libstd version just panic!("memory allocation of {} bytes failed", layout.size()), and delay what happens on panic to the panic handler ?

What's the motivation for allowing extra customization here ? E.g. it appears to me that most people will want to either abort, unwind, or loop forever, which is what the panic handler on the target probably also wants to do. Or is there a use case where the implementations need to be different?

SimonSapin commented 5 years ago

I’m not sure if you mean handle_alloc_error() or #[alloc_error_handler] when you say “the” API.

libstd’s implementation of handle_alloc_error (which you can think of as being provided through #[alloc_error_handler] by libstd, though it isn’t literally) aborts the process even if panic! would unwind the thread.

#[alloc_error_handler] can only be used when libstd is not used. It exists because libcore “doesn’t know” how to abort the process. (We don’t want libcore or liballoc to assume there is a process at all.)

mark-i-m commented 5 years ago

In my case, I'm using alloc_error_handler to trigger the OOM killer in my OS kernel. The ability to customize the behavior of an OOM is important in no_std settings.

gnzlbg commented 5 years ago

@mark-i-m

The ability to customize the behavior of an OOM is important in no_std settings.

How are you querying that the error is an OOM ? Or which allocator it originated from ?


@SimonSapin

I’m not sure if you mean handle_alloc_error()

Yes, that's what I meant.

It exists because libcore “doesn’t know” how to abort the process.

Sounds to me that this problem could be solved with an #[abort_handler] instead. Although @mark-i-m has other interesting uses for the alloc_error_handler.

mark-i-m commented 5 years ago

@gnzlbg

How are you querying that the error is an OOM ? Or which allocator it originated from ?

In my system, the global allocator is the only place where an alloc error can be triggered in kernel mode, so the handler always knows it is an OOM (failures due to fragmentation are functionally the same, so they should also trigger the OOM killer/compaction daemon/swapping). handle_alloc_error just calls into the memory manager, which has access to the allocator anyway.

I haven't really thought about more complex situations, and I don't know much about per-container allocators, so take this with a grain of salt... I also recognize that my use case is pretty niche...

axos88 commented 5 years ago

@mark-i-m, based on what you just said, I got to think about why we want the alloc_error handler to be -> ! ? If we're talking about a kernel, theoretically it should be possible for the kernel to either "expand the heap" (by freeing up memory used for buffers and caches and whatnot), or kill away some of its "processes" or "tasks" or whatever we call them and then retry the allocation. Then the application should be able to continue to work correctly.

This reminds me somewhat of the Memfault handler for the ARM processors.

SimonSapin commented 5 years ago

@axos88 A typical usage is NonNull::new(alloc(layout)).unwrap_or_else(|| handle_alloc_error(layout)). We want something that doesn’t return, to express “I don’t want to deal with this case”.

Any mechanism to “try harder” to allocate memory would be a better fit for being in the allocator itself. That is, in an impl of the GlobalAlloc trait (or Alloc trait), possibly by wrapping another allocator.

By the time handle_alloc_error is called, the program has already decided to abort. It’s to late to declare we’ve found some memory after all.

axos88 commented 5 years ago

Any mechanism to “try harder” to allocate memory would be a better fit for being in the allocator itself.

I disagree. Staying at the example of the kernel, this would mean the allocator would be responsible for instructing the kernel to try to trigger the - let's call it - OOM killer to free some space up for the allocation, whereas IMHO the allocator should only be responsible to tell the kernel that hey, I don't have enough space for this operation, and then the kernel would decide whether to free up space, or crash. In the end it ends up in the same program, but not in the same impl.

By the time handle_alloc_error is called, the program has already decided to abort. It’s to late to declare we’ve found some memory after all.

Then this is not the best name for it, because we're not handling it, we're just trying to accept our fate in a graceful manner.

SimonSapin commented 5 years ago

Agreed that #[alloc_error_handler] is not a great name, something like #[abort_implementation] (mirroring the existing #[panic_implementation]) would be closer to the truth although the core::alloc::Layout parameter is specific to allocations. However this attribute’s name is related to the name of the std::alloc::handle_alloc_error function, which is stable. (Though adding an alias and deprecating the old name is a possibility.)

glandium commented 5 years ago

because we're not handling it

It's a matter of perspective. The error already happened. It's unrecoverable. We handle it by doing something and aborting. It's not "handle_to_maybe_recover", after all.

Ericson2314 commented 5 years ago

In ant event a non-! would be completely different semantics and not fit the current use-case at all. Not saying this is good or bad, but just a massive redesign.

Ericson2314 commented 5 years ago

Basically there are a gazillion interesting things one can do with errors and one global hook can never hope to cover them all. I would expect this to be dead code in a kernel or other program which really cares about error cases. Instead they will be dealt with like all other errors with Result and explicit application code.

axos88 commented 5 years ago

I agree this would be a change in the use case. However if we allow a return from this hook, it would expand the use-case rather than shift it. If the user wants to use it as it is today, that's great, don't return from it. If the hook returns, retry the allocation. If the user did not correct the issue that lead to the allocation failiure, it will fail again, and the hook will be entered again.

As I see it the ability to return from this hook is an extension to the current use-case without any obvious drawbacks.

SimonSapin commented 5 years ago

This is not expanding, this is entirely different functionality that should be proposed separately.

SimonSapin commented 5 years ago

And, remember that the #[alloc_error_handler] attribute can only be used when libstd is not linked. When it is linked, it takes care of providing the implementation of std::process::abort.

mark-i-m commented 5 years ago

Yep, in the case of a kernel, it happens that never returning can be circumvented because it just means the OS can run instead. It makes much less sense for a common application.

Lokathor commented 4 years ago

So, is there anything that alloc_error_handler can normally do other than panic, given the limitation of never being allowed to return?

Can't things that need to allocate without handling errors just panic on failure, and then we can just skip the entire alloc_error_handler concept entirely?

SimonSapin commented 4 years ago

libstd doesn’t panic on allocation failure, it aborts the process. In no_std there may not be a process to abort.

gnzlbg commented 4 years ago

Can't things that need to allocate without handling errors just panic on failure

@Lokathor with -C panic=unwind, a panic! allocates memory, which is something that a process might not want to do if a memory allocation just failed. The alloc_error_handler let's you change the behavior from abort to panic! if you want to.

So, is there anything that alloc_error_handler can normally do other than panic, given the limitation of never being allowed to return?

It can panic!, abort, ud2, loop forever, exit(success) if it wants to, log something and do any of these other things, it can also just call main() again and abort once main finishes, fork the process, ...

The only thing the handler cannot do is return a value, but beyond that, there are many things it can do.

Lokathor commented 4 years ago

I see.

Well is there any ETA on when we could see progress on this moving forward? What are the actual blockers here?

I'd like to be able to build no_std binaries for Windows (all the OSes really), but it turns out that you can't reasonably do that on Stable because you need to use this thing if you want the alloc crate. Otherwise a person would have to basically fork the entire alloc crate just to get around this one snag.

gnzlbg commented 4 years ago

I'd like to be able to build no_std binaries for Windows (all the OSes really), but it turns out that you can't reasonably do that on Stable because you need to use this thing if you want the alloc crate.

Which application do you have in mind that supports the alloc crate but not libstd on Windows and other operating systems?

Lokathor commented 4 years ago

Windows applications that do not rely on the C runtime and do not link to it.