rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.24k stars 12.71k forks source link

containers should provide some way to not panic on failed allocations #29802

Closed froydnj closed 6 years ago

froydnj commented 8 years ago

I was working on a Rust image library, which has code like:

  vec![0xff; self.num_channels * self.width as usize * self.height as usize]

This code should really be checking for overflow on the multiplications. But doing so only eliminates one class of problems with this code: it's still reasonable for a maliciously crafted image to have large self.width and self.height values whose product doesn't overflow usize and yet the amount of memory can't be allocated. (I discovered this through an image test suite that has images with...large widths and heights that ought to return errors, but panic'd in Rust.)

Looking through the documentation, I didn't see any way of avoiding this panic-on-allocation failure, either at vector creation, or when trying to append elements to a vector.

apasel422 commented 8 years ago

I think this is pending the custom allocator work.

steveklabnik commented 8 years ago

Yes, in general, the standard library assumes that you can't really handle OOM. Using libcore and custom allocators, you could implement whatever behavior you'd like, but you'd still need to re-implement this kind of thing on top of it.

The standard library is going to operate this way for the forseeable future, as it would require changing the signatures of stable things (like vec!) in a backwards-incompatible way. So I'm going to give this one a close.

froydnj commented 8 years ago

So I should have been clearer about this: I don't think that all operations would need to return bool or Option or whatever. Firefox's data structures (which I'm most familiar with for something like this) typically have an "infallible" (i.e. panic on allocation failure) and "fallible" (return error value on allocation failure) variant of all the appropriate operations. Simply adding fallible operators, while increasing the API surface quite a bit, should also mean that this work could be done in a backwards-compatible way, as the current (infallible) API wouldn't have to change. So you'd have vec! and vec_fallible!, for instance, Vec::with_capacity and Vec::with_capacity_fallible, and so forth.

To my mind, it'd be better for the standard library to have this sort of thing, rather than folks having to write their own crate for VecFallible<T> or whatever and needing to propagate that through dependencies.

Does that seem reasonable, @steveklabnik ? I'm afraid I'm ignorant of the custom allocator bits cited herein, so I can't really tell if that's a way to modify the allocation behavior of Vec<T> or something else.

apasel422 commented 8 years ago

Independent of the custom allocators, i.e. Vec<T, A> where A: Allocate, I guess Vec would still have to expose fallible versions of these operations in order to avoid A's default out-of-memory behavior. This is possible to implement right now (RawVec manually calls oom, for example), but probably needs an RFC.

Gankra commented 8 years ago

Yeah this is a tricky problem. I'd really not like to bloat up the collections APIs with _fallible variants if we can avoid it. However custom allocators will do nothing for this, as apasel notes.

To be exhaustive, we'd need to provide fallible versions of:

Vec:

Maps:

Unless you believe that only a subset is "truly" necessary?

Gankra commented 8 years ago

Waking up, and this is getting more in cache. Part of this problem is the fuzzy nature of OOM on modern platforms (at least, Linux). It's my understanding that you can happily allocate some eggregious amount of memory from a bad image, only to get smacked down by the OOM killer when you really start trying to use the memory, and there's no way for us to deal with that.

Is there any problem with just having an incredibly reasonable policy like "no images bigger than 100MB"?

apasel422 commented 8 years ago

I assume this wouldn't be backwards-compatible, but it'd be nice to avoid the proliferation of methods with something like:

#![feature(default_type_parameter_fallback)]

pub trait AllocResult<T = (), E = ()> {
    fn ok(value: T) -> Self;
    fn err(err: E) -> Self;
}

impl<T, E> AllocResult<T, E> for T {
    fn ok(value: T) -> T { value }
    fn err(_err: E) -> T { oom(); }
}

impl<T, E> AllocResult<T, E> for Result<T, E> {
    fn ok(value: T) -> Self { Ok(value) }
    fn err(err: E) -> Self { Err(err) }
}

impl<T> Vec<T> {
    pub fn with_capacity<T, R: AllocResult<Self> = Self>(cap: usize) -> R {
        let ptr = alloc::heap::allocate(...);
        if ptr.is_null() { R::err(()) } else { R::ok(Vec { ... }) }
    }

    pub fn push<R: AllocResult = ()>(&mut self, item: T) -> R {
        if self.needs_reallocate() {
            let ptr = alloc::heap::reallocate(...);
            if ptr.is_null() { return R::err(()); }
            ...
        }
        ...
        R::ok(())
    }
}

#[test]
fn test() {
    let vec: Vec<i32> = Vec::with_capacity(5); // aborts on OOM

    let mut vec: Vec<i32> = match Vec::with_capacity(5) { // returns `Err` on OOM
        Ok(vec) => vec,
        Err(_) => panic!("OOM"),
    };

    vec.push(1); // aborts on OOM

    match vec.push(1) { // returns `Err` on OOM
        Ok(_) => {}
        Err(_) => panic!("OOM"),
    }
}

But this is clearly less discoverable and comprehensible for users.

froydnj commented 8 years ago

Waking up, and this is getting more in cache. Part of this problem is the fuzzy nature of OOM on modern platforms (at least, Linux). It's my understanding that you can happily allocate some eggregious amount of memory from a bad image, only to get smacked down by the OOM killer when you really start trying to use the memory, and there's no way for us to deal with that.

That's a reasonable point, but on Windows the memory allocator really can fail, and Rust should be able to deal with that relatively gracefully. I don't know about OS X; I'm guessing it's more like Windows than Linux in this regard?

Is there any problem with just having an incredibly reasonable policy like "no images bigger than 100MB"?

A 100MB image is just 5k x 5k x 32bpp (raw pixel data), which is pretty big (roughly 3x 4k monitors), but doesn't sound like anything too out of the ordinary.

Gankra commented 8 years ago

I'm having trouble finding details TBH. It would appear that FreeBSD supports overcommit, and by extension OSX/iOS seem to as well, at least to a limited extent. It certainly seems that they will do implicit deduplication and then copy-on-write transparently. Since the COW can presumably fail to find any physical memory (otherwise why bother? cache?), that opens up code to the same implicit failure state. iOS at least will send your application a warning that the OOM killer is eyeing it, giving it a chance to make amends (but this may be futile if other memory-starved applications are eating all the memory you're giving up).

sfackler commented 8 years ago

Good old SIGDANGER

oli-obk commented 8 years ago

IIRC there was some discussion about allowing custom allocators someday. In combination with specialization, could an allocator + impl<T> Vec<T, NonPanicAlloc> {} be used to create an entirely new set of functions (returning Result) while blanking out the original functions?

jdm commented 8 years ago

cc me

Gankra commented 8 years ago

One could indeed expose different ops/signatures for custom allocators. In fact I believe it could all be hacked on top of associated types, where type AllocResult = () for GlobalAlloc or whatever. However it seems like this wouldn't be desirable for the OP? As in, they specifically don't want to have to thread fallibility throughout their types. Only in specific locations do they care. Is that correct?

oli-obk commented 8 years ago

Wouldn't it be more general to leave everything as it is and offer a way to locally "catch" oom? Similar to catch_panic?

apasel422 commented 8 years ago

CC @pnkfelix, who has been working on allocator stuff.

froydnj commented 8 years ago

As in, they specifically don't want to have to thread fallibility throughout their types. Only in specific locations do they care. Is that correct?

That's correct.

There's also a readability argument to be made: it's easier to tell whether something can fail if you read:

  if !v.append_fallible(...) { ... }

or whether something is written incorrectly:

  // Forgot to check the return value!
  v.append_fallible(...) { ... }

vs.

  // Is v an infallible vector or did I forget to check the return value?
  v.append(...);

You could address this with #[must_use] on VecFallible<T>'s methods. Firefox went the route of separate types for its primary Vec-alike and found it difficult to share code between types and still provide the annotation. It's possible that doing the sharing and using the annotation would be easier in Rust?

Having two different types for vectors can also make for some awkward code: why do I care about the fallibility of allocating operations if the code that I'm writing doesn't actually care about doing allocation on the vector? You could cleverly address this problem with traits or templates, I suppose.

In any event, Firefox is currently in the process of ripping that type distinction out and using separate methods instead. We think it's much more valuable to see the fallibility of the operation at the call site, rather than having it reside on the type of something, since the type may not be visible/obvious at the call site.

oli-obk commented 8 years ago

We think it's much more valuable to see the fallibility of the operation at the call site, rather than having it reside on the type of something, since the type may not be visible/obvious at the call site.

Yes it should be obvious at the call site and not create type-clutter. But duplicating all the functions is just the same, but instead of having two types, you end up with loads of functions that almost do the same.

With panic it's basically the same, the standard library has mostly functions that return a Result or an Option that the user can unwrap or handle, instead of panicking functions. On the other hand, there's catch_panic, to turn a panicking function into one returning a Result.

A solution similar to catch_panic would look like

if let Ok(()) = catch_oom(|| v.push(42)) {
    // do something
}

if you forget to use the return value, the normal must_use lint would catch that

if such a wrapper is too much, you can always use a macro to make it less verbose: oom!(v.push(42))

bluss commented 8 years ago

Please no catching of oom. I think panic on oom is untenable for the same reasons, more complicated exception safety issues for lots of unsafe-marked code. See #26951. The Rusty way of using return types for error handling is a great model.

You can even add the fallible allocation in user code (example) using unstable features, although I guess it's debatable if rust provides a guarantee that it will work.

@froydnj #[must_use] on Result would cover this actually (it's a per type attribute, not per method).

lilith commented 8 years ago

@bluss, could you elaborate how a failed 1GB allocation (say, for a large photograph) would prevent unwinding later? My understanding is that there are many myths around overcommit, and that the system can actually tell prior to memory access whether an allocation is likely possible.

see also https://internals.rust-lang.org/t/could-we-support-unwinding-from-oom-at-least-for-collections/3673/21

bluss commented 8 years ago

There will be cases due to overcommit where the application receives a "successful" allocation, and it crashes when trying to use it later. However, that doesn't seem to be the problem, because we can't do much about that.

@nathanaeljones I don't understand the scenario, I don't see why a failed allocation would prevent unwinding later. I may be missing something.

The only Rust code that should have to worry about unsoundness in the presence of unwinding is unsafe code, and there's a concern that if we introduce new panics in functions that are used in unsafe code (for example Vec::with_capacity), then we create new soundness issues. As it is presently, the code can rely on the allocation either succeeding or the program aborting, so there is no need to handle inconsistent states for that critical section of the code (whatever it may be).

It does seem unlikely, and I can't find any problematic code with a quick search, but introducing new panics in stable API is a hazard if there is code that relied on it to never panic.

nnethercote commented 8 years ago

I was just wondering about this issue and @froydnj pointed me at this PR.

It is a common occurrence in Firefox that we get crash reports that are triggered by OOMs caused by very large allocation requests (e.g. 100 MB or more). Our standard response to these is to make the allocation fallible. Firefox's data structures support this fairly well so it's usually a straightforward fix.

Although aborting on OOM is a reasonable default, I'm worried that Rust doesn't have a graceful way of recovering from a 100 MB+ allocation request failing.

nnethercote commented 8 years ago

As for Linux overcommit: at least for Firefox, to a first approximation, Windows is the only platform that matters. So I wouldn't focus too much on overcommit.

nnethercote commented 7 years ago

FYI: in Firefox we are planning to implement FallibleVec and FallibleHashMap classes. This will likely involve copy-pasting the Vec and HashMap code and then adding the missing fallible operations. Which is awful, but necessary. (Strings might be necessary, too.)

We regularly change allocation sites in Firefox from infallible (which is the default) to fallible in response to crash reports that show OOM failures due to large allocation sizes. These are typically on Win32 where OOM due to address space exhaustion is common when doing large allocations. See https://bugzilla.mozilla.org/show_bug.cgi?id=1329171 for an example. We need to be able to do the same thing in Rust code.

Gankra commented 7 years ago

Hey all, this has become high priority for the Firefox devs (which includes me) as they integrate more major Rust components into Gecko, so I'm going to start drafting up a proposal for providing some version of fallible collection methods to avoid fragmenting the ecosystem over this matter.

Some quick thoughts:

This is impossible, overcommit!

Don't care. This can make software more reliable when it works. Sometimes it won't. Oh well.

Types vs Methods

I firmly believe that a type-level distinction would be bad to have by default. Fallibility is fairly fluid in practice, and is nice to be able to integrate incrementally. Also if we provide fallible methods it's not a big deal for someone to newtype std collections to only expose the fallible APIs.

API Surface

There are three tacts we can take here:

I'm going to push for minimal for now, with an intent to propose maximal when we have experience with these APIs under our belts.

The API

try_* methods will return a Result, with an Error format based on the new Allocator APIs. I need to review that RFC and the format, which largely went down while I was absent from the community last year.

I expect methods like try_push will need to return the pushed element on Err; I don't want to have to propose a callback-based or Entry-style API for this.

Capacity overflow panics -- might make them a state in the Err? TBD!

nnethercote commented 7 years ago

@gankro: thank you for the write-up. A few notes.

joshlf commented 7 years ago

@Gankro

try_* methods will return a Result, with an Error format based on the new Allocator APIs.

You might want to consider just an Option. The errors used in the Alloc trait (AllocErr) can express one of two error conditions: out of memory (AllocErr::Exhausted) and an unsupported allocation (AllocErr::Unsupported). I strongly believe that the latter should be considered evidence of a bug, not a runtime error (as in, you provided an allocator that cannot support the allocations the collection needs to make). Thus, the only error we care about is OOM, and so an Option suffices - Some on success, None on OOM.

That said, there may be forwards-compatibility issues that using a Result make better, and I'm not experienced enough here to comment, so if that's the reasoning, then I have no concerns. My only comment is to say that we definitely shouldn't have collections returning errors for unsupported allocations - that should just be considered a bug.

oli-obk commented 7 years ago

I have a fourth API option:

Now anyone can choose their preferred method of treating alloc errors, without cluttering the API.

I firmly believe that a type-level distinction would be bad to have by default. Fallibility is fairly fluid in practice, and is nice to be able to integrate incrementally. Also if we provide fallible methods it's not a big deal for someone to newtype std collections to only expose the fallible APIs.

And then we provide a way to convert between fallible and infallible containers while keeping the actual allocator? I'm not really sure how fluid failability is.

Gankra commented 7 years ago

@oli-obk this adds a type-level distinction, which I said I'm not interested in providing.

Also, this involves massive language changes, which I'm not interested in pursuing.

daurnimator commented 7 years ago

@oli-obk that's the best proposal I've seen yet; it's how I wish things had been designed in the first place.

oli-obk commented 7 years ago

adds a type-level distinction, which I said I'm not interested in providing.

The opposite, cluttering the API with functions that do ~almost the same is not an option imo. I have a hard enough time explaining tradeoffs in Rust to my students. Telling them about

use nonpanicking functions returning Result everywhere, except with containers

will not improve the situation.

I think providing an as_fallible method which converts from &Vec<T, FallibleAlloc<Alloc>> to &Vec<T, Alloc> provides the runtime distinction you want.

Gankra commented 7 years ago

I'm strongly considering the as_fallible approach right now, although I still don't think we want to make Vec generic, because it requires significant language changes that likely won't ever happen.

nagisa commented 7 years ago

We could consider adapting the placement-in API to incorporate the notion of fallibility.

whitequark commented 7 years ago

We could consider adapting the placement-in API to incorporate the notion of fallibility.

How would that work?

nagisa commented 7 years ago

Make Placer::make_place() return a Result. vec.place_back() <- EXPR would then desugar to early return on failure, not unlike ?. I’m sure there could be many other options, I haven’t thought about implications of any of the approaches much. catch then can be used to handle the failure locally.

See also brain dump on IRC.

whitequark commented 7 years ago

Vec::reserve() can fail.

aturon commented 7 years ago

@Gankro I'm really glad to have you taking this on!

A few thoughts:

whitequark commented 7 years ago

@aturon et al

We've discussed this with @Gankro on IRC, but for the purpose of visibility I'll go on record here: I believe @Gankro's proposal will handle the majority (if not virtually all) embedded use cases, which progressively fall back to a more complex error handling strategy based on domain requirements:

This also handles the most obvious hosted use case, libraries that cannot afford to bring down their host process on OOM, e.g. those handling user-controlled content such as zip or XML.

Might lead to Intense Macroification of libcollections ().

Why not just replace the implementation of x with try_x().ok_or(alloc::oom)?

jethrogb commented 7 years ago

I think we can do it at the type level while still not using Result everywhere in the infallible case and being backwards-compatible. You just need ATC and improved ergonomics of default type parameters:

trait Allocator {
    type Result<T>;
}

impl Allocator for InfallibleAllocator {
    type Result<T> = T;
}

impl Allocator for FallibleAllocator {
    type Result<T> = Result<T, AllocatorError>;
}

impl Vec<T, A: Allocator = InfallibleAllocator> {
    fn push(v: T) -> A::Result<()>
}

Might lead to Intense Macroification of libcollections ().

Why not just replace the implementation of x with try_x().ok_or(alloc::oom)?

I think that's the point, you'd want a macro to generate all those infallible wrappers

Gankra commented 7 years ago

We don't OOM on capacity overflow today, so that would be a change to the public API. If we feed the reason for failure through then it might be fine (we can match and choose the approach). There's also the matter of whether the fallible code optimizes to the same thing as the infallible code (more branches + a payload to manipulate).

Ericson2314 commented 7 years ago

I've always been a fan of the type-based approach with associated return error types. I think it's easier to implement out the door, and I think if I client wants to mix and match falliability, they should explicitly unwrap for clarity. A sneaky missing ? is far too easy to miss.

The existing methods would require an infallibile allocator, but the default would be the global one with an infallibile wrapper so no compat is broken.

Gankra commented 7 years ago

I've always been a fan of the type-based approach with associated return types. I think it's easier to implement out the door

This is objectively incorrect, as it's literally impossible to implement today, even on nightly.

Edit: to be clear, such a design is blocked on generic associated types, which might be implemented and on stable in late 2018.

Ericson2314 commented 7 years ago

@Gankro you mean something like https://github.com/rust-lang/rust/issues/29802#issuecomment-317114078 ?

Yeah I don't want to do that. I want to add an associated error type to the allocator trait, make it ! in the infallibility wrapper, and then "unwrap" Result<T, !> in the infallible methods.

Gankra commented 7 years ago

Oh ok, well that one just won't ever be implemented, so it's even less possible to implement today?

Ericson2314 commented 7 years ago

Huh? Lets not confused not wanting to do something with the impossibility of doing something. This is the main benefit of the associated error type, so it's good to consider together.

Gankra commented 7 years ago

Sorry do you mean having (on Vec<T, Alloc>):

fn push(&mut self) -> Result<(), Alloc::Err> {
  if cap == len { self.alloc.reserve(1)? }
  ...
}

or

fn push(&mut self) {
  if cap == len { self.alloc.reserve(1).unwrap() }
  ...
}

fn try_push(&mut self) -> Result<(), Alloc::Err> {
  if cap == len { self.alloc.reserve(1)? }
  ...
}
Ericson2314 commented 7 years ago

Ah, sorry if I was unclear before. Here's a more worked out example:

//!
//! in alloc or collections, based on how that shakes out
//!

struct PanicOom<T>(T);

impl<A: Alloc> Alloc for PanicOom<A> {
   type Error = !;

   // much repetition of
   fn something(&mut self) {
     self.something().or_else(|_| self.oom())
   }
}

// In the future, define and use a:
// trait InfalibleAlloc=Alloc<Error=!>;

//!
//! in collections
//!

impl<T, A: Alloc> Vec<T, A> {
  fn try_push(&mut self) -> Result<(), A::Err> {
    if cap == len { self.alloc.reserve(1)? }
    ...
  }
}

// in the future, use `InfalibleAlloc` for more readable bound
impl<T, A: Alloc<Error=!>> Vec<T, A> {
  fn push(&mut self) {
    self.try_push().void_unwrap()
  }
}

//!
//! In std
//!

// stop gap until we have aliases with params with default args
type HashMap<K, V> = HashMap<K, V, PanicOom<GlobalAlloc>>;

The idea being one either uses only vec.push() or vec.try_push(). If they mainly use vec.try_push(), they can add some manual expect, explaining why some critical allocation cannot fail. If they mainly uses push, it would be silly to out-of-the-blue handle an allocation failure when the code at large is not robust against it.

Gankra commented 7 years ago

This is an interesting idea, but it seems like a lot of work for a lint that none of the users I've interviewed actually seem interested in.

@Ericson2314 do you intend to use this API? What for? (feel free to contact me on irc/email/twitter if you prefer)

lilith commented 7 years ago

If we're talking about a lint for unhandled allocation failures (non-fallible allocations), I and many others are very interested. Codec libraries (and most other libs) are usually required to gracefully handle any alloc failures.

On Aug 3, 2017 12:28 PM, "Alexis Beingessner" notifications@github.com wrote:

This is an interesting idea, but it seems like a lot of work for a lint that none of the users I've interviewed actually seem interested in.

@Ericson2314 https://github.com/ericson2314 do you intend to use this API? What for?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rust-lang/rust/issues/29802#issuecomment-320051921, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGln17P3erDaE5NL_lOyO6Wg_Eqha-nks5sUhFWgaJpZM4GhKY0 .

Gankra commented 7 years ago

Sorry to be clear: there is a strong desire for such a lint, but this is a lot of work for a weak version of it.

I could have sworn I linked this thread here, but I don't see it, so here's where I discuss various options and what different users were asking for: https://internals.rust-lang.org/t/notes-interviews-on-fallible-collection-allocations/5619/8

In it I describe hooking into the availability lint system that is intended to solve similar problems like "don't use floats". This would be more robust in that it would also verify you don't call methods which e.g. create their own Vec and infallibly allocate.

Ericson2314 commented 7 years ago

Well, I'm glad you find it interesting :). This is something I've murmored about for a while, but I guess I made the mistake of never actually writing it down.

Truth by told, my day job is Haskell so I personally will not have the time to be a serious consumer of the allocation APIs any time soon. I hope to fine time to use it in my tiny toy OS, but that's it for now.

My general principle is while I'm sure serious industrial users can get away without any sort of lint, it's a nice thing to have that helps everyone not shoot themselves in the foot and, more importantly, help (no-std, for now) library authors not shoot each other in the foot, so the ecosystem can better trust itself.

@aturon mentions the portability lint, but I think that's a rather heavyweight solution for dealing with a pure-Rust interface (as opposed to however allocation is implemented). In particular, no one has proposed a modular way just yet for using it with user-defined CFG tokens. Now, cargo features would be the natural solution, and I suppose we could add a "oom-ignore" default feature for alloc/collections, but I rather put the feature cfg just once on OomPanic than every single faux-infalliable method. That seems more maintainable. Ah oops. We'd need per-method CFGs either way, but it would be easy to forget one without this, whereas the cfg on OomPanic will force us to not forget to also cfg the method (in conjunction with the portability lint).

[edited a bunch, a hard habit to kick.]