rust-lang / wg-allocators

Home of the Allocators working group: Paving a path for a standard set of allocator traits to be used in collections!
http://bit.ly/hello-wg-allocators
209 stars 9 forks source link

Remove `_zeroed` variants and pass a flag instead #44

Closed TimDiekmann closed 4 years ago

TimDiekmann commented 4 years ago

Currently, for almost all methods (except dealloc and shrink(_in_place)) in AllocRef, there is a _zeroed variant, which blows up the trait.

I propose to remove those variants and add a zeroed: bool to the signatures. E.g.:

fn alloc(&self, layout: Layout, zeroed: bool) -> Result<...>;
fn realloc(&self, ptr: NonNull<u8>, layout: Layout, new_size: usize, zeroed: bool) -> Result<...>;
Amanieu commented 4 years ago

I would prefer an enum over a bool: http://blakesmith.me/2019/05/07/rust-patterns-enums-instead-of-booleans.html

Otherwise this seems like a good idea.

Do we want to do something similar for grow/grow_in_place and shrink/shrink_in_place?

TimDiekmann commented 4 years ago

I'm aware of that post, but I wasn't able to find a good enum name so far.

Maybe something like

enum ??? {
    Zeroed,
    Unspecified
}

Yes for grow/grow_in_place, for shrink/shrink_in_place it doesn't make sense, as no new memory is returned.

Amanieu commented 4 years ago

I mean having an InPlace flag instead of having two separate methods.

TimDiekmann commented 4 years ago

Oh, yeah, that's also a good idea. So basically like grow and shrink in https://github.com/rust-lang/rust/pull/69824? :smile: https://github.com/rust-lang/rust/blob/32c0b1761d0cbef051408c6c28f78302935d8188/src/liballoc/raw_vec.rs#L465

Amanieu commented 4 years ago

Tentative names:

enum AllocContents {
    Zero,
    Uninitialized, // With a big warning that reading this is UB if you haven't initialized it
}

enum AllocConstraints {
    None,
    InPlace,
}

Note that I am less sure about the InPlace thing, so we might want to defer on that for now.

Lokathor commented 4 years ago

I'd call it NewMemory, because a realloc doesn't also zeroed the previous data.

TimDiekmann commented 4 years ago

As this will be in core::alloc::, I'd omit the Alloc prefix. I just came up with enum Init { Zero, Uninitialized/Unspecified }. enum Constraint sounds like a bitmask with the abillity to add more variants later.

Lokathor commented 4 years ago

Names should generally make sense on their own, without knowing the module path to the item.

TimDiekmann commented 4 years ago

Can we agree on this?

/// Passed to an allocating method to determine how newly requested memory is 
/// initialized.
#[derive(Debug, Copy, Clone, PartialEq, Eq)]
enum AllocInit {
    /// Depending on the used allocator, newly allocated memory is not 
    /// necessarily initialized. Reading uninitialized memory is 
    /// [undefined behavior]!
    ///
    /// [undefined behavior]: https://doc.rust-lang.org/reference/behavior-considered-undefined.html
    Unspecified,
    /// Newly allocated memory returned by an allocator is guaranteed to be 
    /// zeroed.
    Zero,
}

/// Passed to an allocator to determine how newly requested memory is 
/// allocated when growing or shrinking an existing memory block.
#[derive(Debug, Copy, Clone, PartialEq, Eq)]
enum AllocPlacement {
    /// The location is not specified. The allocator is allowed to make a 
    /// byte-by-byte copy of the memory.
    Unspecified,
    /// When allocating or releasing memory, the contents of the memory must not 
    /// be copied.
    InPlace,
}
Wodann commented 4 years ago

Sounds good to me. Nice work on the RawVec PR too 👏🏻

TimDiekmann commented 4 years ago

Thank you!

Lokathor commented 4 years ago

I feel, dubious about those definitions.

Consider this revision:

/// A desired initial state for allocated memory.
#[derive(Debug, Copy, Clone, PartialEq, Eq)]
enum AllocInit {
    /// The new memory is not initialized to any particular value.
    /// 
    /// Remember that reading uninitialized memory is Undefined Behavior.
    Uninitialized,
    /// The new memory is guaranteed to be zeroed.
    Zeroed,
}

/// A reallocation constraint.
#[derive(Debug, Copy, Clone, PartialEq, Eq)]
enum ReAllocPlacement {
    /// The new address of the memory can be any heap location.
    ///
    /// If the allocation _does_ move, it's the responsibility of the allocator
    /// to also move the data from the previous location to the new location.
    Unspecified,
    /// The address of the new memory must not change.
    ///
    /// If the allocation would have to be moved to a new location to fit, the
    /// reallocation request should instead fail.
    InPlace,
}
TimDiekmann commented 4 years ago

Unspecified is a poor name because that's not used anywhere else in Rust. Just say Uninitialized.

Fair point. Additionally, AllocInit::Uninitialized feels fluently, and it's possible to use core::alloc::{AllocInit::*, ReAllocPlacement::*};, which might comes in handy when using the API

Zero should be changed to Zeroed, so that the tag names are both adjectives.

:+1:

AllocPlacement only makes sense with realloc, so we can call it ReAllocPlacement (bikeshed the capitalization there).

I'd use ReallocPlacement as other allocation APIs also uses the term realloc, not re_alloc.

The comments on the placement variants are kinda fuzzy and could probably be improved.

Thanks, sounds better!

The new address of the memory can be any heap location.

Is that true? Is this necessarily a heap location? I can imagine, that this could also be a global buffer or something, which has been allocated on the stack.

Lokathor commented 4 years ago

Good point on the end there....

Well, so as far as I know, LLVM has the concept of the call stack memory, and then it also has the concept that memory can come "from places that aren't the call stack memory", and that's all it thinks about. And memory "not in the call stack memory space" can come from all sorts of places that you might want an allocator for, and if you've got an OS under you then ultimately you're asking the OS to tell the memory management unit to do the right thing. And if you don't have an OS under you then you're doing whatever the device's data sheets say you can do because you are the OS and you have that phenomenal cosmic power. And we need to allow for all these possibilities.

So... What we probably want to do is to say "The new address of the memory can be any valid location." and then the words "valid location" are a link to a section on the top level of some module that describes Rust's rules (or lack of rules) for what is even a legal allocator.

Such a section would be similar to how the core::ptr module has a section that describes the pointer validity rules, and how the core::sync::atomic and std::thread modules describe that Rust has some sort of system for threads and atomics and you should go read the C++ specs to learn about it because right now we just do "what llvm does" and that's "what C++ does".

Lokathor commented 4 years ago

The main rules would be things that "make sense anyway", like you can't start allocating from a location just past the end of the stack and then call a function because when you push stuff onto the stack it starts writing into your allocation pool and the program gets very sad.

TimDiekmann commented 4 years ago

Adding ReallocPlacement isn't probably a that good decision I initially thought. Lying out the current methods:

fn grow(&mut self, ptr: NonNull<T>, layout: Layout, new_size: usize, init: AllocInit) -> Result<(NonNull<T>, usize), AllocErr>;
fn grow_in_place(&mut self, ptr: NonNull<T>, layout: Layout, new_size: usize, init: AllocInit) -> Result<usize, AllocErr>;
fn shrink(&mut self, ptr: NonNull<T>, layout: Layout, new_size: usize) -> Result<(NonNull<T>, usize), AllocErr>;
fn shrink_in_place(&mut self, ptr: NonNull<T>, layout: Layout, new_size: usize) -> Result<usize, AllocErr>;

grow and grow_in_place / shrink and shrink_in_place has different return values. _in_place means, that the pointer does not change, so returning a pointer is ... pointless, literally. Returning an excess in shrink_in_place results in a trivial default implementation of

fn shrink_in_place(&mut self, ptr: NonNull<T>, layout: Layout, new_size: usize) -> Result<usize, AllocErr> {
    Ok((ptr, layout.size())
}

which cannot fail.

So I'd come to this conclusion:

Amanieu commented 4 years ago

We just agreed in #42 that we can't remove the Result from shrink since success implies that a different (smaller) Layout can now be used for dealloc.

TimDiekmann commented 4 years ago

I was just about to say that :smile: The only solution to this would be to remove the excess API from shrink(_in_place). This would also fix a soundness hole in the current RawVec::into_box implementation.