rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
96.84k stars 12.51k forks source link

Tracking issue for `try_reserve`: RFC 2116 fallible collection allocation #48043

Open aturon opened 6 years ago

aturon commented 6 years ago

This is a tracking issue for the try_reserve part of the RFC "fallible collection allocation" (rust-lang/rfcs#2116).

Steps:

API:

impl /* each of String, Vec<T>, VecDeque<T> */ {
    pub fn try_reserve(&mut self, additional: usize) -> Result<(), TryReserveError> {…}
    pub fn try_reserve_exact(&mut self, additional: usize) -> Result<(), TryReserveError> {…}
}

impl /* each of HashMap<K, V> and HashSet<T> */ {
    pub fn try_reserve(&mut self, additional: usize) -> Result<(), TryReserveError> {…}
}

/// The error type for `try_reserve` methods.
#[derive(Clone, PartialEq, Eq, Debug)]
pub enum TryReserveError { // in std::collections
    /// Error due to the computed capacity exceeding the collection's maximum
    /// (usually `isize::MAX` bytes).
    CapacityOverflow,

    /// The memory allocator returned an error
    AllocError {
        /// The layout of allocation request that failed
        layout: Layout,

        #[doc(hidden)]
        #[unstable(feature = "container_error_extra", issue = "0", reason = "\
            Enable exposing the allocator’s custom error value \
            if an associated type is added in the future: \
            https://github.com/rust-lang/wg-allocators/issues/23")]
        non_exhaustive: (),
    },
}

impl From<LayoutErr> for TryReserveError {
    fn from(_: LayoutErr) -> Self {
        TryReserveError::CapacityOverflow
    }
}
snf commented 6 years ago

Hi, this feature will require oom and try_reserve to be implemented. I'm needing try_reserve and I'm willing to work (slowly) on it if noone else does. I see that the previous commit by @Gankro is here and seems to match the RFC description. Is there anything else I should consider before rebasing that PR?

snf commented 6 years ago

I can use some mentorship on what's the best way to implement oom=panic/abort . Should I emit a global variable through rustc depending on this case?

I was taking inspiration from https://github.com/rust-lang/rust/pull/32900 but there might be an easier way to emit a boolean global variable.

aturon commented 6 years ago

cc @Gankro, can you help mentor?

SimonSapin commented 6 years ago

try_reserve methods are implemented in Nightly, but the accepted RFC also includes the ability to make allocation failure cause a panic rather than abort and that part is not implemented. Unfortunately the RFC does not discuss how this would be implemented, other than being controlled by a flag in Cargo.toml. It’s not clear to me how this can work.

glandium commented 6 years ago

Unfortunately the RFC does not discuss how this would be implemented, other than being controlled by a flag in Cargo.toml. It’s not clear to me how this can work.

We've been talking about panic::set_hook-like hooking for the new oom lang item in #49668, this could be how this works.

SimonSapin commented 6 years ago

@alexcrichton, how does https://github.com/rust-lang/rust/pull/51041 fit with the goal of implementing some way to opt into oom=panic?

alexcrichton commented 6 years ago

@SimonSapin we'd decide that the regression is acceptable, and we'd revert the change.

SimonSapin commented 6 years ago

You use a conditional tense, but oom=panic is already in an accepted RFC (though the implementation strategy is not quite settled).

alexcrichton commented 6 years ago

Ok sure, we can figure it out in the implementation details? AFAIK there's no need to revert the change as it has basically no benefit today. It's a trivial change to revert as well, it's one line in the standard library

3n-mb commented 6 years ago

This example in playground asks for 1024*1024*1024*1000 bytes on a server, via v.try_reserve(len)? without giving an error.

Is this RFC implementation is not at the point, at which errors in such cases will start to show up? Or, is this a gross over-commit by underlying OS, and this state of affairs will stay like this? Can try_reserve in principle reserve only realistic things?

SimonSapin commented 6 years ago

When an error is returned is entirely dependent of the underlying allocator. As far as I know Linux usually overcommits, Windows doesn’t.

https://play.rust-lang.org/?gist=aa668f38117983d1951b58307df04880&version=nightly allocates 120 PB but fails at 130 PB, it is exhausting 48 bits of address space.

Even on Linux though, this API can be useful for 32-bit programs when exhausting the address space is much more realistic.

SimonSapin commented 6 years ago

https://github.com/rust-lang/rust/issues/43596 was the pre-RFC feature request for oom=panic. I’ve repurposed it as the tracking issue for that feature, since it and try_reserved are likely to be stabilized at different times. This leaves this issue as tracking only try_reserve.

snf commented 6 years ago

https://github.com/rust-lang/rust/pull/52420 will change CollectionAllocErr to CollectionAllocErr<AllocError>

SimonSapin commented 6 years ago

Will, or would. I’ve comment there that such a change in public API design merits being discussed separately from a large PR.

Ericson2314 commented 6 years ago

Yeah definitely. Even if it weren't for the 2018 delaying considering these things, I wouldn't expect it to be merged quick. I just make the PR to prove the feasibility and make my ideas clearer to / interactive for others.

Ignoring the benefits of allocator polymorphism itself (that would be https://github.com/rust-lang/rust/issues/42774), I see 4 big benefits:

I thus advocating going with error polymorphism and abandoning try_reverve (I rather reuse try_ for the Result<(), E> copies of every method we'll need for backwards compat). (But, my PR keeps the existing meaning of try_reserve so as to allow a transition period where people can try both approaches.)

SimonSapin commented 5 years ago

Firefox needs try_reserve badly enough that they forked Vec and HashMap over it. It’d be nice to stabilize.

Some of the discussion above is about oom=panic whose tracking has moved to https://github.com/rust-lang/rust/issues/43596. This leaves two unresolved question:

Ericson2314 commented 5 years ago

@SimonSapin Yes if things were done my way Firefox wouldn't be try_reserving or reserving at all, but just fallibly allocate the memory they need when they need it. That's much simpler and more robust.

(There's also the more minor quibble that under the simplest version of my plan, the Self types are different and try_reseve wouldn't be callable for the regular infallible collections. But that can be changed at the cost of more associated type magic.)

The libs team paused associated allocators to prioritize the 2018 edition. That's fine with me, but then I think it only fair that Firefox wait too. I promise as soon as the gates are open I'll rush to revive @glandium's and my PRs.

SimonSapin commented 5 years ago

Right, try_insert for example would be preferable to try_reserve + insert, but it doesn’t exist yet. You know as well as I do that large changes or additions are harder to get through the process, https://github.com/rust-lang/rfcs/pull/2116 was designed deliberately to be minimal.

I think it only fair that Firefox wait too

It’s not about who gets candy first. RFC 2116 was submitted 14 months ago and accepted 8 months ago. The try_reserve part has been implemented 7 months ago, stabilizing it now can be a matter of flipping a switch. Allocator-generic collections however are still in the design phase with a number of open questions, they are still gonna take significant work from multiple people.

That said, there is no real urgency for try_reserve. Firefox using a forked HashMap is unfortunate in my opinion, but it works. Stabilizing try_reserve only seems to me like easy incremental improvement.

Ericson2314 commented 5 years ago

Well it sounds like we are in something closer to agreement than usual. :)

Allocator-generic collections however are still in the design phase with a number of open questions

The code has been written and if I recall correctly works (there was one ancillary issue with trying to make default impls work well, but it is not fundamental and I believe I found a workaround beyond just requiring more boilerplate). It just needs to be read and debated.

alexcrichton commented 5 years ago

@SimonSapin it's been awhile since I took a look at this, but is the idea that (in the long run) every fallible method on a collection returns CollectionAllocErr<T> and the T is specialized per-method to be the "most appropriate"? In the meantime, though, we're only thinking of the try_reserve methods which would effectively return CollectionAllocErr<()> and you're thinking that we can add <T = ()> to the enum definition later?

One possible alternative to this is to avoid defining this type at all I think? For example unlike as we may have originally though AllocErr wasn't stabilized with global allocators. To that end it increases the surface area of what's to be stabilized to stabilize this feature, and it also means we're not necessarily sold on having AllocErr yet. These methods could, perhaps, return a bool for whether they succeeded or not. (this sort of depends if we think it's crucial to switch on why reservation failed, allocation and/or capacity)

I think I'd personally lean towards what you're thinking @SimonSapin by having CollectionAllocErr with a defaulted type parameter later on, if necessary. I might also advocate, as well, having either a non-exhaustive enum or an opaque struct with boolean accessors.

SimonSapin commented 5 years ago

Oh, I forgot that AllocErr was not stable yet. I’m fine with changing CollectionAllocErr to an opaque type for now. (A struct with a private field, to leave us more options later.) Even zero-size, though I’d still prefer Result<(), _> over bool even if they are semantically equivalent.

At the moment I don’t really have an opinion on whether CollectionAllocErr should have a type parameter. I was only responding to https://github.com/rust-lang/rust/issues/48043#issuecomment-409600561 by saying we’ll still be able to do that later if we want. (With the RFC process hat on, to make sure possible objections are not ignored.)

This sort of depends if we think it's crucial to switch on why reservation failed, allocation and/or capacity

It looks like at least some of the code paths in Firefox track the size (and alignment) of the allocation request that failed, so the current enum might be insufficient anyway.

Alright, on further thought it may not make sense to pursue this just yet, until we figure out the rest of the story for allocation traits and error types.

Ericson2314 commented 5 years ago

@alexcrichton So in https://github.com/rust-lang/rust/pull/52420/files I did make a CollectionAllocErr, but that's not really the important part, the associated error type (which would be the parameter) is.

Honestly, I'm not even sure CollectionAllocErr is worth it. It creates two axis with the associated type: do we hide allocation errors, and do we hide size errors? I can't imagine a case where one should cause an panic/abort and the other shouldn't. I rather have just AllocErr as is, or have AllocErr have both variants (In the memory hierarchy we can imagine that at any layer some collection/allocator gets too big, or it cannot get memory from it's backing allocator).

All this another reason, and one I initially forgot (but now remember thinking of when I rebased my PR over try_reserve was added), that stabilizing try_reserve now will tie our hands later.

snf commented 5 years ago

I think this is a feature that we need semi-urgently. There are projects out there that require try_reserve and this is not something that can be quickly replaced by a custom crate. I like the idea of CollectionAllocErr with a private field but I'd like to know if anyone can think of a limitation before proceeding in implementing it.

SimonSapin commented 5 years ago

As far as I can tell, the error type is the only remaining blocker. How about something like this?

pub struct ContainerAllocError {
    layout_of_failed_allocation: Option<Layout>,
}

impl ContainerAllocError {
    pub fn layout_of_failed_allocation(&self) -> Option<Layout> { self.layout_of_failed_allocation }
}
SimonSapin commented 5 years ago

Alternative definition:

pub enum ContainerAllocError {
    CapacityOverflow,
    AllocErr {
        layout: Layout,
        #[unstable(feature = "container_alloc_error_extra")] error: AllocErr,
    },
}

Having a public field for a zero-size type is a bit awkward, but the key idea is that it can stay unstable while the rest of the enum is stabilized (together with try_reserve methods that return it).

When https://github.com/rust-lang/wg-allocators/issues/23 is resolved, based on the outcome we either remove that field:

pub enum ContainerAllocError {
    CapacityOverflow,
    AllocErr { layout: Layout },
}

… or make its type be a new type parameter of the enum:


pub enum ContainerAllocError<E = AllocErr> {
    CapacityOverflow,
    AllocErr {
        layout: Layout,
        error: E,
    },
}

impl SomeContainer<T, A: Alloc> {
    pub fn try_reserve(n: usize) -> Result<(), ContainerAllocError<A::Err>> {…}
}
Ericson2314 commented 5 years ago

@SimonSapin and I disagree on whether it is superior, but my fallibility-polymorphism in https://github.com/rust-lang/rust/pull/60703 is almost ready (see how https://github.com/rust-lang/rust/pull/58457 was just rebased, I just rebased).

Ericson2314 commented 5 years ago

Oh, thanks @SimonSapin for mentioning my issue in the second comment I did not see that the first time sorry. For the record, #60703 goes with the pub enum ContainerAllocError<E = AllocErr> approach, which is more useful because ContainerAllocError<!> correctly ensures only CapacityOverflow is possible since all all allocation failures will panic/abort.

SimonSapin commented 5 years ago

(At a risk of re-hashing arguments already made elsewhere, my view is that when panic/abort on allocation error is desired there is no need for try_reserve over reserve. So ContainerAllocError<!> is not a useful construct IMO. But the proposed unstable field to force non-exhaustive matching leaves the door open either way, so this doen’t need to be decided in this thread.)

SimonSapin commented 5 years ago

PR for the above: https://github.com/rust-lang/rust/pull/61780

Ericson2314 commented 5 years ago

The PR takes into account my concerns and so looks fine to me. Layout is a good idea.

@SimonSapin

(At a risk of re-hashing arguments already made elsewhere,

I think that debate should happen in this issue? After all with my plan for fallibility/allocator polymorphism doesn't really point to try_reserve as written in the RFC being a good idea, even if it is technically possible to support it.

My view is that when panic/abort on allocation error is desired there is no need for try_reserve over reserve.

My argument hinges on one key observation: the code using reserve/try_reserve may not want to decide whether to panic/abort or not on allocation failure, but rather leave that decision to to the user of that code.

In general, we could have arbitrary amounts of collection and and other library code that leaves that decision to only the final executable. There's no good way to write that today without exposing try_ and panicking versions of every method---maybe the stdlib collections have to do that anyways for backwards compat, but lets not put that burden on every other library. You can just expose the try versions and rely on users panicking themselves, but that's very annoying and tedious, and users are likely to make mistakes like panicking on too much if other non-allocation errors are possible. Put shortly, we should never tell library authors to write 2 copies of every function, or anyone to unwrap.

With the polymorphic alloc error, and a safe Result<T, !>::void_unwrap function, library authors can write just one function, and users can do something that, while a bit tedious, is more importantly completely safe. And it lives open the door to even less fiction with e.g. a Result<T, !> <=> T coercion.

SimonSapin commented 5 years ago

https://github.com/rust-lang/rust/pull/61780 has merged. I believe it resolves the remaining issues. I’ve updated the API signatures in the issue description.

Is there anything else to tweak here before stabilization? CC @Manishearth for the hashglobe crate, @rillian for mp4parse_fallible

@rfcbot fcp merge

rfcbot commented 5 years ago

Team member @SimonSapin has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

dtolnay commented 5 years ago

What's the deal with the possible custom error value in TryReserveError::AllocError?

https://github.com/rust-lang/rust/blob/bdfd698f37184da42254a03ed466ab1f90e6fb6c/src/liballoc/collections/mod.rs#L59-L64

Does that involve later possibly adding a type parameter with a default value to TryReserveError? If so, that would break code that constructs CapacityOverflow because default type parameters only come into play in type position not expression position.

let _ = TryReserveError::CapacityOverflow;
error[E0282]: type annotations needed for `TryReserveError<E>`
  --> src/main.rs:22:13
   |
22 |     let _ = TryReserveError::CapacityOverflow;
   |         -   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cannot infer type for `E`
   |         |
   |         consider giving this pattern the explicit type `TryReserveError<E>`, where the type parameter `E` is specified
SimonSapin commented 5 years ago

Does that involve later possibly adding a type parameter with a default value to TryReserveError?

Correct.

If so, that would break code that constructs CapacityOverflow because default type parameters only come into play in type position not expression position.

Ugh, that’s a good point. Back to the drawing board I guess.

@rfcbot fcp cancel

rfcbot commented 5 years ago

@SimonSapin proposal cancelled.

TimDiekmann commented 4 years ago

Another concern: If we decide to forbid zero sized allocations (https://github.com/rust-lang/wg-allocators/issues/16) the allocator would probably take a NonZeroLayout or similar. In order to not block this issue even more, I propose to mark layout in AllocError as unstable, too, until the WG found a consens.

SimonSapin commented 4 years ago

At this point I think there are three options here:

  1. Keep blocking stabilization of this feature (which has been implemented for 19 months now) for an unbounded amount of time until the Allocators WG decides whether allocators should have an associated error type https://github.com/rust-lang/wg-allocators/issues/23. Note that WG’s repository has had very little activity in the last 5 months, after a brief surge when in was created.

  2. Or take a bet that standard library collections will eventually be made allocator-generic with an associated error type, and stabilize try_reserve with a generic payload field in TryReserveError::AllocError that defaults to ().

    The risk is that if the bet turns out wrong, we’ll have a useless field and type parameter that is always (), in a public type.

  3. Or take a bet that standard library collections will not be made allocator-generic (that feature might end up on parallel collection types, possibly outside of the standard library), or that they will but with a concrete zero-size error type, and stabilize try_reserve and TryReserveError without a payload field.

    The risk is that if the bet turns out wrong, we’ll have try_reserve methods that drop the error value it receives from the allocator.

My preference is for 2 or 3, but I don’t have a strong opinion between them. @rust-lang/libs what do you think?

TimDiekmann commented 4 years ago
Note that WG’s repository has had very little activity in the last 5 months, after a brief surge when in was created.

It is true that after the repository was opened, a few months was silent. Recently, however, there was an increase in activity again and there is a WIP PR that is currently blocking the WG from pushing anything upstream.

So I would have suggested to make the errortype non-exhaustive. The non-exhaustiveness can still be removed later.

alexcrichton commented 4 years ago

I'm personally pretty wary and against stabilizing something just because progress isn't happening on it. Stabilization takes real work and has a cost. Simply because an API exists doesn't mean we should stabilize it, even if it's been sitting for months. I think it's pretty bad design to get things into the standard library, then point out it's sat with no feedback for months and propose stabilization.

Fallible collections are hard in Rust. There's really no way we can escape that. At this point it still seems clear that the design work that needs to be done has not been done. It also seems like no one really knows how to get the design work done.

I agree this would be a great feature to have, and yeah this has sat around forever as unstable. That being said the length of time something has sat is not at all a justification for stabilization in my opinion.

SimonSapin commented 4 years ago

I think it's pretty bad design to get things into the standard library, then point out it's sat with no feedback for months and propose stabilization.

My understanding was that “Should try_reserve’s error type carry an error value that generic allocators may return” was the only remaining question that needed resolving. I wrote https://github.com/rust-lang/rust/issues/48043#issuecomment-543952770 on that basis.

But it sounds like you have concerns that are larger than that?

alexcrichton commented 4 years ago

It sounds like your first bullet in that comment has quite a lot of exasperation flowing through it about how slow this process is, and it also sounds like you're drawing attention to emphasize your case about how everyone else should be exasperated about it as well. I understand this process has been slow, and I think it's pretty reasonable that you're exasperated.

I don't think, however, that the exasperation is a justification for stabilization one way or another. We still do not have information about how allocators in the standard library will work out (per collection). That work just simply hasn't been done. You're proposing that we bet one way or another on this API. I think betting is fine, but I want to be very clear that exasperation at the slowness of the process is not a reason to stabilize something. It's can be a reason for stabilization because the functionality is useful enough, but it's generally unproductive in my opinion to vent such exasperation while simultaneously proposing stabilization.

I do not personally have an opinion about how this API should turn out, I don't really want to wade into the design issues of custom allocators on collections personally. I again just want to point out that the exasperation isn't particularly helpful and should never be a reason for stabilization.

gnzlbg commented 4 years ago

Now that some more work has happened on the design of allocators, I think I would prefer to make all methods of a particular Vec that allocate panic on OOM (instead of aborting), by using something like a PanicOnOOM<MyAlloc> allocator adapter, e.g., like:Vec<T, PanicOnOOMAlloc<MyAlloc>>.

That is, instead of returning a null pointer on OOM from the alloc methods, such an allocator would just panic, allowing the user of Vec to catch the panics. This would make not only a handful of allocating methods fallible, but all of them. And well, if you only want "fallible" allocation for some sections of the code, you can just go from a Vec<T, MyAlloc> to a Vec<T, PanicOnOOM<MyAlloc>> for that particular section and back, and then you are guaranteed that no matter how that code gets refactored (e.g. somebody adds a call to Vec::push somewhere instead of Vec::try_push, you'll get a panic).

This means that Vec::try_push and Vec::try_reserve become maybe a little bit less useful, since if unwinding is available, you can just use Vec::push and Vec::reserve (or extend, or anything really). However, for -C panic=abort, unwinding doesn't work, so this solution won't work there. To support this use case, Vec::try_{push,reserve} are very much necessary, since we need to manually propagate a Result value up the stack to allow recovery.

This approach also has some other implications, e.g., about how the API of the Alloc trait must specify what's valid for an allocator to do on OOM, but I guess that can be simply: if the alloc functions return a non-null pointer, then allocation succeeded, else, allocation failed (but that does not imply that the allocation function returns normally at all: it can abort, panic, execute some callback, etc.). This allows all sort of wrappers that let you "hook" your own "on_alloc_error" callbacks, on a per allocator handle basis.

SimonSapin commented 4 years ago

The RFC that proposed try_reserve also separately proposed an opt-in for panic instead of abort, so that unwinding could be an additional option to recover from OOM:

Under that proposal it’s handle_alloc_error that would change behavior. GlobalAlloc::alloc would still be expected (or encouraged) to return null on errors. It also does not requires allocator-generic containers.

malbarbo commented 4 years ago

@SimonSapin any update?

Ericson2314 commented 4 years ago

We now have a renewed effort at https://github.com/rust-lang/rust/pull/72314 so we can experiment with try_* methods across the board. I wish (albeit from the sidelines) that such a PR had landed before so we could already be experimenting with those things, but even though that's gone slower than I hope I still think it's prudent to wait and not stabilize this.

SimonSapin commented 4 years ago

@malbarbo I’m not sure what you’re expecting, this is the thread where further discussion about try_reserve should happen. As you can read above, the current status is that stabilization is blocked on deciding what extra data if any TryReserveError::AllocErr should carry, which ties into whether generic allocators should have an associated error types. And more generally there is not good consensus to stabilize before generic allocators overall are more settled.

@Ericson2314 https://github.com/rust-lang/rust/pull/72314 is "Stability annotations on generic parameters", which would help making containers generic on the allocator type, but how is it related to try_* methods? For example Vec can have try_reserve without an extra type parameter.

TimDiekmann commented 4 years ago

ties into whether generic allocators should have an associated error types.

I don't think this is going to be fully resolved until we get some allocators on collections upstream and this is currently blocked by #72314.

kornelski commented 3 years ago

77118 has been merged. Can stabilization of this proceed?

TimDiekmann commented 3 years ago

I don't really see, how #77118 is related to this? Where to you want to add an unstable generic annotation?

kornelski commented 3 years ago

Previous comment mentioned it was blocked on #72314, and #72314 has been replaced by #77118.