Open SimonSapin opened 6 years ago
Better idea: remove the lang item and move all of OOM handling to liballoc, except for the default hook that prints to stderr. Have compiler magic similar to that of #[global_allocator]
that uses the printing default hook when std
is linked, or a no-op default hook otherwise.
Bikeshedding on oom
renaming: alloc_error
would be shorter.
I forgot to link https://github.com/rust-lang/rust/pull/51543, which does indeed use alloc_error
.
@japaric, @glandium, @Amanieu I’ve updated the the issue description here with arguments from https://github.com/rust-lang/rust/pull/51607 and https://github.com/rust-lang/rfcs/pull/2480 and to list two alternatives.
Replying to @glandium's https://github.com/rust-lang/rfcs/pull/2480#issuecomment-401243511 here to not derail the alloc RFC.
Every #[no_std] user of the alloc crate would have to implement their own
Not everyone has to implement their own. An implementation can be packed into a crate and it just takes adding a extern crate oom_abort;
to register an OOM handler -- this is no different than what we have for global allocators (e.g. extern crate alloc_jemalloc
).
and they can't use instrinsics::abort, so it becomes harder than necessary.
Then stabilize intrinsics::abort
or provide a stable oom_abort
crate with the rust-std
component. Though it's better to stabilize intrinsics::abort
because then we can use it for #[panic_implementation]
too.
The problem is that with a #[oom] attribute, you don't get a default handler at all.
No default OOM handler is good. There's no sensible default for every single combination of no_std
program and target anyways. For example, intrinsics::abort produces a linker error on MSP430 because there's no abort instruction in the MSP430 instruction set.
Also, not having a default is consistent with both #[panic_implementation]
and #[global_allocator]
in #[no_std]
context. Why special case the OOM handler?
Another reason why I like static registration of global properties is that it's less error prone. Say you want the OOM (or panic) handler to behave differently between development and final release. With #[oom] you can write this:
#[cfg(debug_assertions)]
extern crate oom_report; // for development, verbose
#[cfg(debug_assertions)]
extern crate oom_panic; // for release, minimal size
This is clearly wrong, you get a compiler error. With the set_alloc_error_hook
you don't get a compiler error; you won't notice the problem until you hit an OOM in dev and lost your opportunity to track down the root of the OOM.
The other reason I like #[oom] / #[panic_implementation] is that you can be sure of the behavior of the OOM / panic if you register the handler in the top crate.
extern crate oom_abort; // I'm 100% sure OOM = abort
With the hook API you have no guarantee
fn main() {
set_alloc_error_hook(|| abort()); // OOM = abort, maybe
dependency::function(); // this is totally free to change the OOM handler
// ..
}
Finally, if you need the ability to override the OOM handler at runtime using hooks you can implement that on top of #[oom].
@japaric I find this convincing regarding static v.s. dynamic, thanks.
For example, intrinsics::abort produces a linker error on MSP430 because there's no abort instruction in the MSP430 instruction set.
Oh, I was wondering if something like that ever happened.
How does one typically deal with unrecoverable situations on MSP430? An infinite loop?
No default OOM handler is good.
I think we can have a default, with a Sufficiently Advanced Compiler. (Grep for has_global_allocator
for code that does something similar.)
But you’re saying we should not, and force that question on ever no_std
user. This would be one more barrier before being able to get to Hello World.
https://github.com/rust-lang/rust/pull/51607#issuecomment-401893293
FWIW an attribute like
#[oom]
I also think would be a great idea (and I think I'm also convinced that it may be worth it over this dynamic strategy), and it could be implemented pretty similar to#[panic_implementation]
Right. I think the remaining question is: should there be a default? If not, what should be a "typical" implementation that we’d recommend in docs? Should we add a stable wrapper for core::intrinsics::abort
? (In what module?)
The docs for core::intrinsics::abort
claim:
The stabilized version of this intrinsic is
std::process::abort
But besides availability, the two functions do not have equivalent behavior: https://github.com/servo/servo/pull/16899.
How does one typically deal with unrecoverable situations on MSP430? An infinite loop?
@pftbest and @cr1901 would be more qualified to answer that question. I'm not a MSP430 developer myself.
On Cortex-M the abort instruction triggers the HardFault handler. While developing I configure panic and HardFault to log info about the program, or just to trigger a breakpoint if binary size is a problem. In release mode, I usually set panic to abort and make the HardFault handler disable interrupts, shut down the system (e.g. turn off motors), signal the failure (e.g. turn on a red LED) and go into an infinite loop but this is very application specific. Also, in some applications reaching an unrecoverable situation means that something is (very) wrong with your software and that it should not be deployed until it's fixed.
But you’re saying we should not, and force that question on ever no_std user. This would be one more barrier before being able to get to Hello World.
should there be a default?
I think there should not be a default.
If not, what should be a "typical" implementation that we’d recommend in docs?
The #![no_std] application - target space is too broad to recommend anything that will behave the same everywhere. Consider intrinsics::abort:
Should we add a stable wrapper for core::intrinsics::abort? (In what module?)
Yes. As core::arch::$ARCH::abort
maybe? But only on architectures whose instruction sets define an abort / trap instruction.
The docs for core::intrinsics::abort claim:
Those docs should be fixed. iirc process::abort tries to does some clean up (of the Rust runtime?) before aborting the process. intrinsics::abort is an LLVM intrinsic that maps to the abort instruction of the target instruction set so the two are not equivalent.
How does one typically deal with unrecoverable situations on MSP430? An infinite loop?
Infinite loop is how I typically deal w/ it. @pftbest can override me if he knows something better though, as it's been a while since I delved into msp430 internals :).
https://github.com/rust-lang/rust/pull/52191 has landed with an attribute for a statically-dispatched function, and no default.
mm/krust/libkrust.a(alloc-062fb091d60c735a.alloc.67kypoku-cgu.11.rcgu.o): In function `alloc::alloc::handle_alloc_error':
alloc.67kypoku-cgu.11:(.text._ZN5alloc5alloc18handle_alloc_error17h59bd4dd5f11cdd3fE+0x2): undefined reference to `rust_oom'
I'm getting the above in one my projects. I have defined the handle_alloc_error
function as in the OP. The code compiles just fine, but the linker cannot find the function.
However, it seems to work if I add extern
to the function definition.
@mark-i-m I assume you mean a #[alloc_error_handler]
function, in a #![no_std]
crate. Could you provide a set of steps/files to reproduce the issue?
@SimonSapin Sorry, yes, that's what I meant. Let me try to distill my code down to a minimal example.
Hmm... I'm having a very hard time reproducing this minimally. The build environment I am in is rather convoluted and involves linking again some compiled C code.
I don’t know if unwinding without std
is supported at all. Consider either adding std
to your dependency graph or compiling with panic = "abort"
.
What's the status of stabilization of this feature? We're using alloc
in developing a kernel, and it's currently one of the few features keeping us on nightly.
What is the library team's involvement here? #[alloc_error_handler]
seems like a pure T-Lang matter as it, as far as I understand, does not involve any changes to the standard library's public API. The function set_alloc_error_hook
seems like a different matter but this issue just tracks the attribute...
I disagree. There’s an ad-hoc attribute because we don’t have a more general mechanism for “call a function based on its declaration without depending on a crate that contains its definition, and require exactly one definition somewhere in the crate graph, whose signature is type-checked.” There’s precedent for this kind of attribute with #[global_allocator]
, and this one is no different as far as the language is concerned.
But the role of this handler and the handle_alloc_error
function that it supports are entirely a library matters.
Ad-hoc or not... anything and everything that cannot be done in the stable language and needs compiler support is a T-Lang matter (unless it is about implementation defined behavior or optimizations within the confines of the specification, in which case it's a T-Compiler matter). This includes attributes, intrinsics, and lang items. When a general mechanism for libraries is provided it becomes a T-libs matter. In this case you are actually performing custom type checking involving both attributes and lang items.
Yes, #[global_allocator]
and in particular https://github.com/rust-lang/rfcs/pull/2325 happened. In my view those are clearly examples of where the language team should have been involved but was not. They are "precedents" I don't want to repeat.
So just to circle back, what needs to happen to stabilize #![feature(alloc_error_handler)]?
Maybe let's not stabilize it? And do https://github.com/rust-lang/rfcs/pull/2492 instead. (Oh i just noticed that's nominated, yay!)
Can someone explain how this API fits with GlobalAlloc
?
AFAICT, it is not part of the GlobalAlloc
trait. Allocators can return ptr::null_mut()
if an allocation cannot be satisfied, and the GlobalAlloc::alloc
documentation calls out that this isn't necessary due to OOM (https://doc.rust-lang.org/1.29.1/std/alloc/trait.GlobalAlloc.html#errors).
In C, errno
can be used to query whether malloc
errored, and why. In Rust there doesn't seem to be an alternative.
So I would understand that allocators could provide this API, to allow users to query the type of error, or to abort if the type of error was OOM.
Yet this API appears to be independent from the allocator, and any code can override it. So I'm confused about what the purpose of this API is, and how can an user implementing it for any GlobalAlloc
be even be able to tell, whether the allocator ran out of memory or not.
Maybe this API has nothing to do with OOM, or error handling in general, and it is supposed to just be called by code that wants to terminate the process if any allocation error happens ?
If so I find the name a bit misleading, and I still don't know how can an implementer of this API be able to to anything better than printing "An allocation error happened for this Layout", without any specific knowledge of the allocator being used.
For example, on Linux, one could provide a better and more standard error message by calling explain_malloc(size)
(https://linux.die.net/man/3/explain_malloc), but for handle_alloc_error
to be able to call that, it would need to know that the allocator being used is the system's malloc, and not, e.g., jemalloc.
@gnzlbg I’m having a hard time understanding the first of your last two messages. This attribute has nothing to do with querying information from the allocator.
When an allocation fails (which in GlobalAlloc::alloc
is represented by returning null), APIs like Vec::push
that don’t want to propagate that failure to their caller can instead call std::alloc::handle_alloc_error(layout: Layout) -> !
.
When libstd is linked, handle_alloc_error
prints a message to stderr then aborts the process. In a #![no_std]
environment however, we can’t assume there is such a thing as a process or a standard output. Therefore we require the program to provide (through this attribute) a function that does not return, which handle_alloc_error
will call.
This is similar to the #[panic_handler]
attribute, which provides a function that panic!()
(eventually) calls.
Maybe […] it is supposed to just be called by code that wants to terminate the process if any allocation error happens ?
Yes.
If so I find the name a bit misleading,
Although the attribute is unstable and we can rename it, it relates to the name of std::alloc::handle_alloc_error
which is stable.
and I still don't know how can an implementer of this API be able to to anything better than printing "An allocation error happened for this Layout", without any specific knowledge of the allocator being used.
That’s exactly what libstd does. This attribute is all about replacing that when libstd is not used.
TIL about explain_malloc
. It looks like it’s not in libc but part of a separate library. Does it only work with glibc?
Regarding information about why an allocation failed, maybe we can add APIs for that (maybe through https://github.com/rust-lang/wg-allocators/issues/23) but the #[alloc_error_handler]
attribute is unrelated.
Thanks for the explanation.
I'm still trying to fill in the blanks about how std::alloc::handle_alloc_error
is supposed to print the allocation error message without knowing anything about the allocator. As in, if we were to go from Box<T>
/Vec<T>
to Box<T, A>
/Vec<T, A>
, how is handle_alloc_error
supposed to know which A
was used for the allocation ?
Or is the intent to only use this API for the global allocator and introduce a different API for other allocators ?
It looks like it’s not in libc but part of a separate library. Does it only work with glibc?
The system one works with the system allocator, but each memory allocator has its own APIs for this.
So for example, with jemalloc, you probably want to override malloc_message
, which jemalloc will use to dump the reason for an error to a stream, and on error, you probably want to open this stream, and print its contents to the user. With TCMalloc you might / might not want to print the status of the heap on error.
Other allocators would have their own ways to communicate this information. E.g. APIs like posix_memalign
return an error code that one can then use to print an error string (e.g. alignment is not a power of two; alignment is not a multiple of sizeof(void*)
, alignment request too high, etc.). On error, a global_allocator
wrapping that in Rust can either panic, or return a null pointer. But when Vec gets the error and wants to print it, it would be nice for it to be able to do so. AFAICT the allocator only communicates with Vec
via a raw pointer, so the allocator would need to store the error in some local context, e.g., errno
-style, and handle_alloc_error
would need to query that. But for doing so, it needs to have some kind of API to the allocator that was used for the failed allocation.
The message that libstd prints is "memory allocation of {} bytes failed", layout.size()
regardless of the allocator type. It doesn’t print more information than that. I am not aware of any plan or request to make it print more than that (before yours today, if we take this as a feature request).
I think I misunderstood what the API was for. I thought this was an API for printing why, as in, the reason, an allocation failed.
Instead, it appears to be for printing within libstd an error message saying that an allocation attempt for some layout failed, and that's it. Since the API is public and stable, downstream crates can use it to print the same error message that libstd does.
What I'm not sure I understand is why does the implementation of the API need to be a lang item. Can't the libstd version just panic!("memory allocation of {} bytes failed", layout.size())
, and delay what happens on panic to the panic handler ?
What's the motivation for allowing extra customization here ? E.g. it appears to me that most people will want to either abort, unwind, or loop forever, which is what the panic handler on the target probably also wants to do. Or is there a use case where the implementations need to be different?
I’m not sure if you mean handle_alloc_error()
or #[alloc_error_handler]
when you say “the” API.
libstd’s implementation of handle_alloc_error
(which you can think of as being provided through #[alloc_error_handler]
by libstd, though it isn’t literally) aborts the process even if panic!
would unwind the thread.
#[alloc_error_handler]
can only be used when libstd is not used. It exists because libcore “doesn’t know” how to abort the process. (We don’t want libcore or liballoc to assume there is a process at all.)
In my case, I'm using alloc_error_handler
to trigger the OOM killer in my OS kernel. The ability to customize the behavior of an OOM is important in no_std
settings.
@mark-i-m
The ability to customize the behavior of an OOM is important in no_std settings.
How are you querying that the error is an OOM ? Or which allocator it originated from ?
@SimonSapin
I’m not sure if you mean handle_alloc_error()
Yes, that's what I meant.
It exists because libcore “doesn’t know” how to abort the process.
Sounds to me that this problem could be solved with an #[abort_handler]
instead. Although @mark-i-m has other interesting uses for the alloc_error_handler
.
@gnzlbg
How are you querying that the error is an OOM ? Or which allocator it originated from ?
In my system, the global allocator is the only place where an alloc error can be triggered in kernel mode, so the handler always knows it is an OOM (failures due to fragmentation are functionally the same, so they should also trigger the OOM killer/compaction daemon/swapping). handle_alloc_error
just calls into the memory manager, which has access to the allocator anyway.
I haven't really thought about more complex situations, and I don't know much about per-container allocators, so take this with a grain of salt... I also recognize that my use case is pretty niche...
@mark-i-m, based on what you just said, I got to think about why we want the alloc_error handler to be -> !
? If we're talking about a kernel, theoretically it should be possible for the kernel to either "expand the heap" (by freeing up memory used for buffers and caches and whatnot), or kill away some of its "processes" or "tasks" or whatever we call them and then retry the allocation. Then the application should be able to continue to work correctly.
This reminds me somewhat of the Memfault handler for the ARM processors.
@axos88 A typical usage is NonNull::new(alloc(layout)).unwrap_or_else(|| handle_alloc_error(layout))
. We want something that doesn’t return, to express “I don’t want to deal with this case”.
Any mechanism to “try harder” to allocate memory would be a better fit for being in the allocator itself. That is, in an impl of the GlobalAlloc
trait (or Alloc
trait), possibly by wrapping another allocator.
By the time handle_alloc_error
is called, the program has already decided to abort. It’s to late to declare we’ve found some memory after all.
Any mechanism to “try harder” to allocate memory would be a better fit for being in the allocator itself.
I disagree. Staying at the example of the kernel, this would mean the allocator would be responsible for instructing the kernel to try to trigger the - let's call it - OOM killer to free some space up for the allocation, whereas IMHO the allocator should only be responsible to tell the kernel that hey, I don't have enough space for this operation, and then the kernel would decide whether to free up space, or crash. In the end it ends up in the same program, but not in the same impl.
By the time handle_alloc_error is called, the program has already decided to abort. It’s to late to declare we’ve found some memory after all.
Then this is not the best name for it, because we're not handling it, we're just trying to accept our fate in a graceful manner.
Agreed that #[alloc_error_handler]
is not a great name, something like #[abort_implementation]
(mirroring the existing #[panic_implementation]
) would be closer to the truth although the core::alloc::Layout
parameter is specific to allocations. However this attribute’s name is related to the name of the std::alloc::handle_alloc_error
function, which is stable. (Though adding an alias and deprecating the old name is a possibility.)
because we're not handling it
It's a matter of perspective. The error already happened. It's unrecoverable. We handle it by doing something and aborting. It's not "handle_to_maybe_recover", after all.
In ant event a non-! would be completely different semantics and not fit the current use-case at all. Not saying this is good or bad, but just a massive redesign.
Basically there are a gazillion interesting things one can do with errors and one global hook can never hope to cover them all. I would expect this to be dead code in a kernel or other program which really cares about error cases. Instead they will be dealt with like all other errors with Result
and explicit application code.
I agree this would be a change in the use case. However if we allow a return from this hook, it would expand the use-case rather than shift it. If the user wants to use it as it is today, that's great, don't return from it. If the hook returns, retry the allocation. If the user did not correct the issue that lead to the allocation failiure, it will fail again, and the hook will be entered again.
As I see it the ability to return from this hook is an extension to the current use-case without any obvious drawbacks.
This is not expanding, this is entirely different functionality that should be proposed separately.
And, remember that the #[alloc_error_handler]
attribute can only be used when libstd is not linked. When it is linked, it takes care of providing the implementation of std::process::abort
.
Yep, in the case of a kernel, it happens that never returning can be circumvented because it just means the OS can run instead. It makes much less sense for a common application.
So, is there anything that alloc_error_handler
can normally do other than panic, given the limitation of never being allowed to return?
Can't things that need to allocate without handling errors just panic on failure, and then we can just skip the entire alloc_error_handler
concept entirely?
libstd doesn’t panic on allocation failure, it aborts the process. In no_std
there may not be a process to abort.
Can't things that need to allocate without handling errors just panic on failure
@Lokathor with -C panic=unwind
, a panic!
allocates memory, which is something that a process might not want to do if a memory allocation just failed. The alloc_error_handler
let's you change the behavior from abort
to panic!
if you want to.
So, is there anything that alloc_error_handler can normally do other than panic, given the limitation of never being allowed to return?
It can panic!
, abort
, ud2
, loop forever, exit(success)
if it wants to, log something and do any of these other things, it can also just call main()
again and abort
once main finishes, fork the process, ...
The only thing the handler cannot do is return a value, but beyond that, there are many things it can do.
I see.
Well is there any ETA on when we could see progress on this moving forward? What are the actual blockers here?
I'd like to be able to build no_std
binaries for Windows (all the OSes really), but it turns out that you can't reasonably do that on Stable because you need to use this thing if you want the alloc
crate. Otherwise a person would have to basically fork the entire alloc
crate just to get around this one snag.
I'd like to be able to build no_std binaries for Windows (all the OSes really), but it turns out that you can't reasonably do that on Stable because you need to use this thing if you want the alloc crate.
Which application do you have in mind that supports the alloc
crate but not libstd
on Windows and other operating systems?
Windows applications that do not rely on the C runtime and do not link to it.
This attribute is mandatory when using the
alloc
crate without thestd
crate. It is used like this:Implementation PR: https://github.com/rust-lang/rust/pull/52191
Blocking issues
Original issue:
In a
no_std
program or staticlib, linking to thealloc
crate may cause this error:This is fixed by providing the
oom
lang item, which is is normally provided by thestd
crate (where it calls a dynamically-settable hook https://github.com/rust-lang/rust/issues/51245, then aborts). This is called byalloc::alloc::handle_alloc_error
(which is called byVec
and others on memory allocation failure).However, defining a lang item is an unstable feature.
Possible solutions include:
Add and stabilize a dedicated attribute similar to the
#[panic_implementation]
attribute:The downside is that this is one more mandatory hoop to jump through for
no_std
program that would have been fine with a default hook that aborts.Movestd
’s dynamically-settable hook intoalloc
: https://github.com/rust-lang/rust/pull/51607. The downside is some mandatory space overhead.