Open Gankra opened 7 years ago
Note that in previous topics I stated that libunwind doesn't handle OOM well, but this appears to be outdated or incorrect information. They preallocate their memory before an unwind is even requested. In the limit they might run out of memory and give up but it's not like we've lost anything for trying.
I figured I'd comment on this with respect to https://github.com/rust-lang/rust/issues/42808 and the outcome there along with the @rust-lang/libs team discussion. It was discovered in https://github.com/rust-lang/rust/issues/42808 that the integration of the Alloc
trait caused a number of code size related regressions which when diagnosed were almost exclusively related to unwind tables and landing pads. This investigation then caused https://github.com/rust-lang/rust/pull/44049 as the only two stable implementations of allocators don't panic.
Note, though, that this is not necessarily directly related to OOM but just allocator implementations themselves. Should we forbid, in general, panics from allocators? Should we only allow the oom method to panic?
Unsure!
I personally think it is reasonable to forbid the allocator itself from panicking and instead have the oom
method be responsible for handling allocation failures and panicking or aborting as necessary. I also think it would be useful if oom
was told the size of the allocation that failed so it could heuristically determine whether it is an actual memory exhaustion event or just someone accidentally specifying a silly large number.
@retep998 fn oom
does receive the AllocErr
, which for AllocErr::Exhausted
will carry the input Layout
that was requested. Does that not suffice for what you describe?
(I personally wanted to try to stuff the Layout
onto both AllocErr
variants, but I worried about bloating the size of the error type after it was no longer an associated item for the allocator...)
@pnkfelix Yep, that would do exactly what I want. It's my fault for not looking at the existing API and seeing that we already had that 😛
RFC https://github.com/rust-lang/rfcs/pull/2116 was accepted, I’ve repurposed this feature request into a tracking issue. See more in this thread’s original message.
We now have std::alloc::set_alloc_error_hook
(tracked at https://github.com/rust-lang/rust/issues/51245) which is document as not allowed to unwind. If that restriction could be lifted, this hook could be the mechanism to implement this feature (by having the program register at start-up a hook that panics).
Minor question: if someone sets oom=panic
and panic=abort
, should that function identically to oom=abort
, or should it be a compilation error with the rationale that such a configuration is likely a mistake on the user's part?
I expect it would be mostly identical, though the error message that’s printed might not be exactly the same. I don’t know if it’s worth forbidding explicitly even if it’s not particularly useful. An observable effect (though again not particularly useful since std::alloc::set_alloc_error_hook
also exists) would be that a hook set through std::panic::set_hook
is run on OOM.
Making allocation errors default to panic has undergone FCP successfully: https://github.com/rust-lang/rust/issues/66741
I wonder if there's a point to this cargo-level feature anymore?
https://github.com/rust-lang/rust/issues/66741 is only relevant in programs where libstd
is not used at all. This issue is about the behavior of libstd
.
I had sent my ideas to https://github.com/rust-lang/wg-allocators/issues/62 ; but as it looks like there already is a tracking issue, let's move the comments here.
Assume a rust server running multiple things on behalf of clients. It uses regular Rust code, that is not written to care specifically about this particular use case.
The server wants to limit the memory consumption clients could force upon it. It does so by having each client's allocations be in an arena allocator, and then wants to abort just that client's connection in case of “OOM.”
Unfortunately, this means that OOM must not abort the whole process, and AFAIU that is what the current behavior is, even with set_alloc_error_hook -- or at least so do the docs read like.
I'm not sure oom=panic
as a global setting could work properly in all cases, as potentially panicking requires memory allocation? But I think that already allowing set_alloc_error_hook
to panic would allow the user to swap the error hook at the same time as the allocator, and in case of OOM revert to the global allocator and then, once it's possible to allocate again, perform the panic.
Does that make sense?
It's probably the case that (a) the amount of memory needed to alloc panic data is fairly low and (b) you'd hit oom from asking for some large amount that can't be offered. In this situation, you'd probably be able to allocate the panic data.
However, if you fail to allocate while panicing you'll just get a panic, and a panic during panic is an abort. So at least you're not trapped in a loop ;P
But also: you can't carelessly change the global allocator while allocated things are present in the world or they'll get potentially freed on the wrong allocator, which is obviously bad.
Basically the entire scenario you want can't be done with rust's current conventions regarding memory allocation. You'd need specialized local-allocator using code, or waiting for alloc-wg to make the entire ecosystem more alloc aware over time.
Just for clarification: I was assuming the existence of a global allocator that is able to use different memory pools for allocation depending on some thread-local. Entering a task's context would set the thread-local to the task's memory pool, and either exiting it or hitting an OOM during it would reset it to the global memory pool. It knows which memory pool to free from without the memory pool being necessarily the same as the one that was configured during allocation, thanks to knowing the address of the memory block to free.
By doing this, I think that what I want to do can be achieved with the current conventions regarding memory allocations, but iff set_alloc_error_hook
can be set not to abort, but to panic instead :)
Oh, and I forgot to answer your initial points: unfortunately, a client able to make the server abort is a DoS vulnerability, so… that's not a good thing :( And I can totally see the client being able to fill the pool by making the server allocate one byte at a time -- the server could try to prevent that by keeping its own counter of how much memory each task has used, but it requires potentially large changes in every library used by the server, while the solution based on allocators would work without requiring cooperation from every piece of the code (including those I don't control) :)
It knows which memory pool to free from without the memory pool being necessarily the same as the one that was configured during allocation, thanks to knowing the address of the memory block to free.
Yes, this aspect is critical to the soundness of this kind of scheme.
iff
set_alloc_error_hook
can be set not to abort, but to panic instead :)
Yes, see https://github.com/rust-lang/rust/issues/43596#issuecomment-400659080. I don’t know why this restriction exists, though.
For why this restriction exists, I don't know why either, but (for other people who may be reading this, I know you've already read it) the three messages starting from https://github.com/rust-lang/rust/issues/51245#issuecomment-393503805 explain that it's indeed here :)
My first guess would be: panicking might need to allocate itself on some platforms… maybe? And potentially this allocation would not be handled by rust, and thus not be covered by rust's allocation handling, and then Things would happen.
Now, for use cases like the one I describe, where there still is memory available for the program and it's just an arena that got full, allowing the replacement would make sense. But maybe it'd require set_alloc_error_hook
to be unsafe
?
panicking might need to allocate itself on some platforms… maybe?
It does, on all platforms.
First if you use panic!
with a formatting string and additional arguments, the formatted string is allocated:
Then that string is boxed to make panic payload pointer:
If instead you use panic!
with a single argument, that value is the payload which is also boxed:
panic!
with zero argument has a &'static str
payload that goes in the same code path as with single argument.
potentially this allocation would not be handled by rust, and thus not be covered by rust's allocation handling, and then Things would happen.
No, these allocations do go through Rust’s global allocator.
If they fail and call a alloc error hook that panics, this is another case of panicking while panicking that can already happen with Drop
impls. This causes the process to abort. This happening may be undesirable, but it’s not Undefined Behavior.
But maybe it'd require
set_alloc_error_hook
to beunsafe
?
Why?
For why this restriction exists, I don't know why either, but (for other people who may be reading this, I know you've already read it) the three messages starting from #51245 (comment) explain that it's indeed here :)
Because we mark global allocator functions as nounwind
to allow the compiler to optimize better.
@Amanieu do you mean functions like std::alloc::alloc
? Those return null on failure though. This is about unwinding from std::alloc::handle_alloc_error
, which is already in an error path so optimizing it might be less important?
But maybe it'd require set_alloc_error_hook to be unsafe?
Why?
This was written under the assumption that possibly some platforms would have unwinding happen outside of the scope of what rustc controls, which might end up doing wrong things if in OOM conditions. It's very possible that this assumption is wrong, though -- I would even hope it is, as it would mean that this unsafe
would not be deserved :)
It's a little hard to tell from the comments both here and on https://github.com/rust-lang/rust/issues/51245 what is happening and/or needs to happen to make progress on this. My company has a use case very similar to the one described (don't want OOM on one task to tear down the whole process), and we might be able to contribute to moving this forward, but would need some guidance on how to help do that. Could someone point me in the right direction?
Looks like panic
king on OOM will never correctly handle all OOM cases, because panic
itself can OOM. What about throwing an object that does not require allocation? Instead of
panic!("oom")
we can do
resume_unwind(Box::new(AllocError)); // AllocError is ZST
This has the added benefit that the catcher can distinguish this from other panics (which usually are unrecoverable programming errors).
@alexcrichton @pnkfelix Would the Rust team be interested in reviewing a PR that does this? There are a few other places in libstd that do not allow catching an OOM error - I'm looking into these and so far all of them are fixable.
resume_unwind
doesn't exist for #![no_std]
. In addition it doesn't actually print a panic message. For that you need panic_any
which doesn't exist for #![no_std]
either. #![no_std]
only allows panicking with an &str
or core::fmt::Arguments
. It doesn't support arbitrary panic payloads.
resume_unwind doesn't exist for #![no_std]
I take it that oom=panic
is for users with std
? See rust-lang/rfcs#2116:
- For users with unwinding, an oom=panic configuration is added to make global allocators panic on oom.
- For users without unwinding, a try_reserve() -> Result<(), CollectionAllocErr> method is added.
Global allocators and unwinding are std
-only. If someday no_std
gains unwinding support, we'll probably have resume_unwind
for no_std
as well.
Global allocators are available in no_std+alloc scenarios, while resume_unwind
is std
only I think. Disallowing oom=panic
in no_std+alloc would be fine though - those users would likely want to use fallible APIs instead of panicking on oom.
if there isn't a way to deny that dependencies ever use infallible allocation then that's basically an unworkable situation.
Practically speaking, -C oom=panic
would be implemented by having the compiler-generated main
function call set_alloc_error_hook
to change the hook to one that panics instead of aborting. The changes needed to allow alloc_error_handler
to unwind are in #88098 and #88700.
no_std
programs should instead define a custom alloc_error_handler
which can unwind or abort as needed.
The issue I'm raising is oom=panic
will never be oom-safe as long as panicking allocates. An allocation error handler simply should not allocate. To solve this, I think oom=unwind
is a good replacement: the actual reason we wanted oom=panic
is to unwind.
Bikeshed: Should this be renamed to alloc-error=unwind
after 2b789bd0570983e82533f9ed30c80312ac334694 changed everything oom to alloc_error?
To solve this, I think oom=unwind is a good replacement: the actual reason we wanted oom=panic is to unwind.
That still requires allocating the exception object I think. Also catch_unwind
which actually catches the panic returns a boxed panic payload.
The issue I'm raising is oom=panic will never be oom-safe as long as panicking allocates.
I believe this problem can be mitigated by using a bit of memory reserved just for allocations necessary during panicking. Maybe it could even be an eagerly initialized thread local variable.
let's also remember that one allocation failure doesn't mean eternal allocation failure. If you try to allocate a billion byte array and fail, maybe it's still possible to allocate 20 bytes for an error to describe what went wrong.
catch_unwind which actually catches the panic returns a boxed panic payload.
A boxed ZST doesn't require allocation.
That still requires allocating the exception object I think.
I believe this problem can be mitigated by using a bit of memory reserved just for allocations necessary during panicking.
I agree. However, there's a distinction between reserving an exception object and reserving a panic payload. Exception objects are an implementation detail and it's easy to make them use reserved memory. Panic payload is already exposed via the resume_unwind
API which allows the user to catch the payload. The user can simply drop the payload and now you've lost your reserved memory :(
let's also remember that one allocation failure doesn't mean eternal allocation failure. If you try to allocate a billion byte array and fail, maybe it's still possible to allocate 20 bytes for an error to describe what went wrong.
True. Still, is it really a good idea to rely on "small objects" being available? Most allocators don't request memory in increments of 20 bytes - they might request, say, a 64K block each time. Allocating 20 bytes can still become a 64K memory request and fail.
Panic payload is already exposed via the
resume_unwind
API which allows the user to catch the payload. The user can simply drop the payload and now you've lost your reserved memory
If the AllocError
is a ZST and if Box
of a ZST were implemented such that the inner pointer is NULL
and allocators are required to implement freeing a NULL
pointer as a no-op then there would be no issue, right?
More importantly, the oom=panic feature will create memory safety hazards in code that currently doesn't have memory-safety hazards. In general I think we'll have to assume by default that crates are not compatible with oom=panic unless they explicitly claim otherwise, and I think the build system should enforce this. I expanded on this at https://github.com/rust-lang/rfcs/pull/2116#issuecomment-1006144661.
if
Box
of a ZST were implemented such that the inner pointer isNULL
I'm not sure what you mean here, but it is currently guaranteed that Box<T>
is a non-null pointer, no matter the T
. This allows Option<Box<T>>
to be represented as a single pointer. From https://doc.rust-lang.org/stable/std/boxed/index.html#memory-layout:
For zero-sized values, the
Box
pointer still has to be valid for reads and writes and sufficiently aligned.
And the docs on pointer validity says:
- A null pointer is never valid, not even for accesses of size zero.
Box
doesn't actually allocate for zero-sized values - it creates a non-null dangling pointer:
fn main() {
let val = Box::new(());
println!("{:?}", Box::into_raw(val));
}
prints 0x1
A while back, I experimented with getting a complete zero-alloc panic to work (I used a custom GlobalAlloc
that enforced that no allocations happened after a certain point). I believe I got it working for ZST payloads - I can try to dig up the code.
Tagging as needs-summary; it'd help to know the current state of this so we can understand what it would take to stabilize this.
It's not implemented yet. The implementation PR is #88098 and is waiting for review.
@Amanieu what's the current status of this tracking issue? It looks like you implemented the feature in https://github.com/rust-lang/rust/pull/88098, is this just waiting on an eventual push for stabilization?
I think this just needs to be stabilized. We might need to audit the standard library to check that we stay sound in the face of unwinding from allocations. Also I'm not sure what the state of the documentation is.
What are the chances of stabilizing this soon? I've written workaround for it #84612 a year ago, and I wonder if I should push for the workaround still.
I don't think std is ready for oom=panic yet. The unwinding path is not allocation-free, so likely currently oom=panic only works when a large allocation fail (but the allocator can still satisfy a few small ones).
that still seems like a strict improvement over the current situation.
@nbdd0121 As long as this behavior is safe (I assume it causes abort like a panic during a panic), I don't think this issue is a blocker to stabilization, because this deficiency doesn't change the public interface of oom=panic
. It's a quality-of-implementation issue, but even aborting sometimes is better than aborting every time.
Currently we panic with error message, and that will require allocation, and I consider that a publicly visible behaviour.
If the error message is fixed, we should be able to pre-allocate the necessary object (though it might require some hacky code in the panic internals).
Even for fixed error message we need to allocate a Box<dyn Any>
for it, and since &str
is not a ZST, there are actual heap allocation required. We couldn't preallocate it because we need to allocate one of them for each thread (not all threads are created by std APIs).
C++ doesn't have this issue because it uses special __cxa_allocate_exception
and __cxa_free_exception
instead of normal heap allocation, which we can't do because our API is just a plain Box
.
We could make the panic payload a ZST, and that'll avoid allocation problem on unwind path entirely.
All other allocations that we currently have can be avoided, I think.
We’re constrained to Box<dyn Any>
because it’s part of std::thread::Result
, right?
There are 2 levels of allocation when panicking, first a Box<String>
or Box<&str>
must be allocated to hold the panic message. Then (with dwarf unwinding) an exception objects needs to be allocated on the heap, which holds the Box<Any>
.
The exception object doesn't need to be allocated on the heap. It can be #[thread_local]
.
Update: This now the tracking issue for a mechanism to make OOM / memory allocation errors panic (and so by default unwind the thread) instead of aborting the process. This feature was accepted in RFC https://github.com/rust-lang/rfcs/pull/2116.
It was separated from https://github.com/rust-lang/rust/issues/48043, which tracks another feature of the same RFC, since the two features will likely be stabilized at different times.
Blockers
This should be blocked until we can audit all places in the code that allocate to ensure they safely handle unwinding on OOM.
Steps:
std::alloc::set_allocation_error_hook
(tracked at https://github.com/rust-lang/rust/issues/51245) could potentially be this mechanism, but that hook is currently document as not allowed to unwind.Original feature request:
Several users who are invested in the "thread is a task" model that Rust was originally designed around have expressed a desire to unwind in the case of OOM. Specifically each task has large and unpredictable memory usage, and tearing down the first task that encounters OOM is considered a reasonable strategy.
Aborting on OOM is considered unacceptable because it would be expensive to discard all the work that the other tasks have managed to do.
We already (accidentally?) specified that all allocators can panic on OOM, including the global one, with the introduction of generic allocators on collections. So in theory no one "should" be relying on the default
oom=abort
semantics. This knob would only affect the default global allocator. Presumably there would just be a function somewhere in the stdlib which is cfg'd to be a panic or abort based on this flag?This is not a replacement for proper fallible allocation handling routines requested in #29802 because most of the potential users of that API build with
panic=abort
. Also the requesters of this API are unwilling to go through the effort to ensure all their dependents are using fallible allocations everywhere.I am dubious on this API but I'm filling an issue so that it's written down and everyone who wants it has a place to go and assert that it is in fact good and desirable.