rust-lang / wg-allocators

Home of the Allocators working group: Paving a path for a standard set of allocator traits to be used in collections!
http://bit.ly/hello-wg-allocators
203 stars 9 forks source link

Proposal: Storage-backed Allocator #93

Open DecoyFish opened 2 years ago

DecoyFish commented 2 years ago

Here's another take on enabling a storage-backed Allocator trait with minimal changes to the current implementation. This is inspired by @matthieu-m's awesome storage-poc (see also #79).

Locally, I've prototyped a rough Box and Vec implementation along with several Allocator impls to gain a base level of assurance. However, I'm somewhat new to both Rust and this feature so I might have overlooked key issues. Please also let me know if I have or if this (or a similar) proposal has already been considered and rejected/deferred.

Objectives

Issues

Currently it faces the same limitation as @matthieu-m's storage-poc: the Box implementation is not coercible because a type can't effectively implement CoerceUnsized while containing <T as Pointee>::Metadata. This issue is being discussed in rust-lang's #81513.

While not blocking, it would reduce the required constraints in data structure implementations if Pointee::Metadata required Metadata<Self>.

Proposal

// There are a few key changes:
//   - Associated type Buffer.
//   - Because Self::Buffer has no direct concept of its own layout, each (re-)allocation also
//     returns a Layout. Callers can leverage this to learn how much extra space (if any) was
//     allocated.
//   - Because Self::Buffer is uniquely-owned, it consume it on (de-)reallocation.  Relatedly, it
//     is also returned in the Err on reallocation failure.
//
// SAFETY concerns for this trait should match those of the current trait.
pub trait Allocator {
    type Buffer: Buffer;

    fn allocate(&self, layout: Layout) -> Result<(Self::Buffer, Layout), AllocError>;

    unsafe fn deallocate(&self, buffer: Self::Buffer, layout: Layout);

    unsafe fn grow(
        &self,
        buffer: Self::Buffer,
        old_layout: Layout,
        new_layout: Layout,
    ) -> Result<(Self::Buffer, Layout), Self::Buffer>;

    unsafe fn shrink(
        &self,
        buffer: Self::Buffer,
        old_layout: Layout,
        new_layout: Layout,
    ) -> Result<(Self::Buffer, Layout), Self::Buffer>;

    // OMIT: other methods like *_zeroed, as_ref...
}

// From initial prototyping (and some godbolt usage), this appears to be inlined quite successfully
// - in keeping with the goal of zero cost abstractions.  Let me know if anyone foresees issues
// with this, though.
pub trait Buffer {
    // SAFETY: Proven<T, M>::layout() must 'fit' the buffer (see Allocator trait for 'memory fit').
    unsafe fn as_mut<T: ?Sized, M: Metadata<T>>(&mut self, metadata: Proven<T, M>) -> &mut T;

    // SAFETY: Proven<T, M>::layout() must 'fit' the buffer (see Allocator trait for 'memory fit').
    unsafe fn as_ref<T: ?Sized, M: Metadata<T>>(&self, metadata: Proven<T, M>) -> &T;

    // Used in Allocator implementations - particularly for copying from one buffer to another
    // during reallocation in an intermediate/fallback Allocator.
    //
    // SAFETY: The pointer is only valid during F's invocation.
    unsafe fn with_ptr<F: FnOnce(NonNull<u8>)>(&mut self, layout: Layout, apply: F);
}

// This trait is strictly additive and can be added later (or not at all).  It is used to build
// things like ThinArc in which the Metadata is stored at the start of the allocation.  Using this
// trait we can (1) constrain allocators to have ThinBuffers and (2) read the sized "head"/metadata
// out of the buffer without knowing the layout of the buffer.
//
// These methods are not included in Buffer because not all Buffer implementations will support
// them.  Consider a union-based Buffer (like the one used in SmallVec's union feature).  In order
// to know which union variant is valid, we must first know the capacity (metadata) of the Buffer.
// That said, most Buffer implementations will likely be capable of implementing this trait.
pub trait ThinBuffer: Buffer {
    unsafe fn as_mut<T>(&mut self) -> &mut T;

    unsafe fn as_ref<T>(&self) -> &T;
}

mod metadata {
// Provides layout and metadata information for a given type.  Note the following:
//  - Might not contain a valid layout (this can happen if size overflows on slice metadata).
//  - Might be a type other than T::Metadata (e.g., a Vec with max 256 elements might store
//    allocation's size/capacity in a u8).
pub trait Metadata<T: ?Sized + Pointee>: Copy {
    fn try_layout(self) -> Result<Layout, LayoutError>;

    fn metadata(self) -> T::Metadata;
}

// Wraps Metadata<T> which have a known valid Layout.  Consequently, subsequent layout checks can
// be unchecked.  The key is that this type can only be constructed via conversion from valid-
// Layout metadata sources.
pub struct Proven<T: ?Sized, M: Copy>(M, PhantomData<fn(T) -> T>);

impl<T: ?Sized, M: Metadata<T>> Proven<T, M> {
    pub fn fatten(self, ptr: NonNull<()>) -> NonNull<T> {
        NonNull::from_raw_parts(ptr, self.0.metadata())
    }

    pub fn layout(self) -> Layout {
        unsafe { self.0.try_layout().unwrap_unchecked() }
    }
}

// OMIT: From impls for Proven to convert from &<T as Pointee> and Layout where
//       T: Sized + Pointee<Metadata = usize>
}

PS: I am not good at naming. My dog's fairly lucky that he didn't end up named "Dog". Please let me know if you have any suggestions regarding better names.

CAD97 commented 2 years ago

Having not watched the repo is embarrassing...

Allocating a Buffer and having it be able to dereference itself is convenient. But while this works well for a pointer or a stack buffer, what it doesn't work for is using e.g. usize as a buffer handle; this requires asking the storage to resolve handles. Consider an allocator backed by Vec<maxalign_t>; allocation which requires the backing store to grow means moving the existing allocations.

My current draft of the a storage API: https://cad97.github.io/storages-api/storage_api/index.html

Ultimately I've landed on fully making the API typeless and just handing out &[MaybeUninit<u8>]. The API being complex was the biggest concern when I brought up the API for cursory discussion in the rust-lang Zulip; removing the types serves to remove a lot of incidental complexity from the core API goal: managing memory.

And I think it's still useful to have a separate trait for allocation than storage. Allocation is still a fundamental building block, and much easier to implement than a storage. Additionally, it means dynamic allocator support can be AllocStorage<dyn Allocator> rather than dyn Storage.

But perhaps I should look into if Allocator can be a refinement of Storage where Handle is self-dereferencable... With specialization and default impl it might even be as simple to implement.

matthieu-m commented 2 years ago

@DecoyFish

Locally, I've prototyped a rough Box and Vec implementation along with several Allocator impls to gain a base level of assurance.

Both of those have contiguous single allocations, I would encourage you to also try to prototype something like a linked-list to ensure that your allocation/buffer API:

(I expect your API would work well, but there may always be a tiny detail...)


@CAD97

Ultimately I've landed on fully making the API typeless and just handing out &[MaybeUninit<u8>].

And this was a great step forward, as it massively reduces the number of traits, removes the need for GATs, etc...

And I think it's still useful to have a separate trait for allocation than storage. Allocation is still a fundamental building block, and much easier to implement than a storage

This I am unclear on. Since Storage is more flexible, it seems like you'd want any collection to be expressed in terms of Storage for maximum flexibility for the user.

What would be useful if a user could implement an Allocator and pass it as a Storage, though requiring a small adapter like AllocStorage<A> is not a huge hurdle.

CAD97 commented 2 years ago

@matthieu-m

And I think it's still useful to have a separate trait for allocation than storage.

This I am unclear on.

The key thing which convinces me that this is the fact that allocations are treated as pure. There's two places we can place the magic transformation from normal AM semantics to laundering: on the #[global_allocator] border, or on the Allocator border (c.f. https://github.com/rust-lang/wg-allocators/issues/101).

A really nice property of Storage to me is that it's a pure library concept. It would significantly diminish it if the API was made magical and could omit calls, whereas that's actually desirable over the allocator boundary.

Having two separate traits allows us to separate the two.

...

Though I have another interesting design that might be convincing me otherwise...

👀👀👀 ```rust use { crate::assert_unsafe_precondition, core::{ alloc::{AllocError, Layout}, hash::Hash, mem::MaybeUninit, ptr::{copy_nonoverlapping, NonNull}, }, }; pub type Memory = [MaybeUninit]; /// Types which can be used to manage memory handles. /// /// The behavior of this trait is refined by traits [`PinningStorage`], /// [`MultipleStorage`], and [`SharedStorage`]. pub unsafe trait Storage { /// The handle which is used to access the stored memory. type Handle: Copy + Hash + Eq; /// Allocate a memory handle in this storage. /// /// The handled memory is not initialized. /// Any previously existing handles into this storage are invalidated. fn allocate_mut(&mut self, layout: Layout) -> Result; /// Deallocate an object handle in this storage. /// /// The handled memory is not required to be valid in any way. /// All handles into this storage are invalidated. /// /// # Safety /// /// - The handle must be valid in this storage. /// - The layout must be the same as used to allocate the handle. unsafe fn deallocate_mut(&mut self, handle: Self::Handle, layout: Layout); /// Resolve a memory handle in this storage to a reference. /// /// # Safety /// /// - The handle must be valid in this storage. /// - The layout must be the same as used to allocate the handle. unsafe fn resolve_ref(&self, handle: Self::Handle, layout: Layout) -> &Memory; /// Resolve a memory handle in this storage to a mutable reference. /// /// # Safety /// /// - The handle must have been created by this storage, /// and must not have been invalidated. /// - The layout must be the same as used to allocate the handle. unsafe fn resolve_mut(&mut self, handle: Self::Handle, layout: Layout) -> &mut Memory; /// Grow a memory handle to a larger size. /// /// If this function succeeds, then the old handle is invalidated and the /// handled memory has been moved into the new handle. The new length is /// uninitialized. /// /// If this function fails, then the old handle is not invalidated and /// still contains the memory in its state before calling this function. /// /// # Safety /// /// - The handle must have been created by this storage, and must not have /// been invalidated. /// - `old_layout` must be the same as used to allocate the handle. /// - `new_layout.size() >= old_layout.size()`. /// /// Note that `new_layout.align()` is not required to be the same as /// `old_layout.align()` unsafe fn grow_mut( &mut self, handle: Self::Handle, old_layout: Layout, new_layout: Layout, ) -> Result; /// Shrink a memory handle to a smaller size. /// /// If this function succeeds, then the old handle is invalidated and the /// prefix of the handled memory has been moved into the new handle. /// /// If this function fails, then the old handle is not invalidated and /// still contains the memory in its state before calling this function. /// /// # Safety /// /// - The handle must have been created by this storage, and must not have /// been invalidated. /// - `old_layout` must be the same as used to allocate the handle. /// - `new_layout.size() <= old_layout.size()`. /// /// Note that `new_layout.align()` is not required to be the same as /// `old_layout.align()` unsafe fn shrink_mut( &mut self, handle: Self::Handle, old_layout: Layout, new_layout: Layout, ) -> Result; } /// A storage that serves as a uniqueness barrier. /// /// Notably, this means that this storage can go `&Storage -> &mut Memory`, and /// thus it is possible to mutate the stored memory behind a shared storage /// reference, and to mutably resolve multiple handles separately without /// invalidating previously resolved handles. /// /// [`resolve`]: Storage::resolve /// [`resolve_mut`]: Storage::resolve_mut pub unsafe trait SharedStorage: Storage { /// Allocate a memory handle in this storage. /// /// The handled memory is not initialized. fn allocate(&self, layout: Layout) -> Result; /// Deallocate an object handle in this storage. /// /// The handled memory is not required to be valid in any way. The handle is /// invalidated. /// /// # Safety /// /// - The handle must have been created by this storage, and must not have /// been invalidated. /// - The layout must be the same as used to allocate the handle. unsafe fn deallocate(&self, handle: Self::Handle, layout: Layout); /// Resolve a memory handle in this storage to a pointer. /// /// # Safety /// /// - The handle must have been created by this storage, and must not have /// been invalidated. /// - The layout must be the same as used to allocate the handle. /// /// The pointer's validity is tied to the lifetime of the `&self` used to /// resolve it, as if this function returned a reference `&[mut] Memory`. unsafe fn resolve(&self, handle: Self::Handle, layout: Layout) -> NonNull; /// Grow a memory handle to a larger size. /// /// If this function succeeds, then the old handle is invalidated and the /// handled memory has been moved into the new handle. The new length is /// uninitialized. /// /// If this function fails, then the old handle is not invalidated and /// still contains the memory in its state before calling this function. /// /// # Safety /// /// - The handle must have been created by this storage, and must not have /// been invalidated. /// - `old_layout` must be the same as used to allocate the handle. /// - `new_layout.size() >= old_layout.size()`. /// /// Note that `new_layout.align()` is not required to be the same as /// `old_layout.align()` unsafe fn grow( &self, handle: Self::Handle, old_layout: Layout, new_layout: Layout, ) -> Result { assert_unsafe_precondition!(new_layout.size() >= old_layout.size()); let old_handle = handle; let new_handle: Self::Handle = self.allocate(new_layout)?; let old_ptr = self.resolve(old_handle, old_layout); let new_ptr = self.resolve(new_handle, new_layout); copy_nonoverlapping( old_ptr.as_mut_ptr(), new_ptr.as_mut_ptr(), old_layout.size(), ); self.deallocate(old_handle, old_layout); Ok(new_handle) } /// Grow a memory handle to a larger size. /// /// If this function succeeds, then the old handle is invalidated and the /// handled memory has been moved into the new handle. The new length is /// uninitialized. /// /// If this function fails, then the old handle is not invalidated and /// still contains the memory in its state before calling this function. /// /// # Safety /// /// - The handle must have been created by this storage, and must not have /// been invalidated. /// - `old_layout` must be the same as used to allocate the handle. /// - `new_layout.size() >= old_layout.size()`. /// /// Note that `new_layout.align()` is not required to be the same as /// `old_layout.align()` unsafe fn shrink( &self, handle: Self::Handle, old_layout: Layout, new_layout: Layout, ) -> Result { assert_unsafe_precondition!(new_layout.size() <= old_layout.size()); let old_handle = handle; let new_handle: Self::Handle = self.allocate(new_layout)?; let old_ptr = self.resolve(old_handle, old_layout); let new_ptr = self.resolve(new_handle, new_layout); copy_nonoverlapping( old_ptr.as_mut_ptr(), new_ptr.as_mut_ptr(), new_layout.size(), ); self.deallocate(old_handle, old_layout); Ok(new_handle) } } pub unsafe trait Allocator: SharedStorage + Storage> {} default unsafe impl Storage for S where S: SharedStorage, { fn allocate_mut(&mut self, layout: Layout) -> Result { self.allocate(layout) } unsafe fn deallocate_mut(&mut self, handle: Self::Handle, layout: Layout) { self.deallocate(handle, layout) } unsafe fn resolve_ref(&self, handle: Self::Handle, layout: Layout) -> &Memory { self.resolve(handle, layout).as_ref() } unsafe fn resolve_mut(&mut self, handle: Self::Handle, layout: Layout) -> &mut Memory { self.resolve(handle, layout).as_mut() } unsafe fn grow_mut( &mut self, handle: Self::Handle, old_layout: Layout, new_layout: Layout, ) -> Result { self.grow(handle, old_layout, new_layout) } unsafe fn shrink_mut( &mut self, handle: Self::Handle, old_layout: Layout, new_layout: Layout, ) -> Result { self.shrink_mut(handle, old_layout, new_layout) } } unsafe impl Storage for S where S: Allocator, { type Handle = NonNull<()>; } default unsafe impl SharedStorage for A where A: Allocator, { unsafe fn resolve(&self, handle: Self::Handle, layout: Layout) -> NonNull { NonNull::from_raw_parts(handle, layout.size()) } } ```

But I think I have a conclusion from this experiment:

  • At least propose the SharedStorage design as an alternative to the Multiple/SharedMutability design
    • The theory being that MultipleStorages not providing &self allocation or SharedMutability is rare in practice
    • The downside being duplicating the API between the &mut and & versions
  • Allocator is a well defined refinement of Storage, even ignoring pinning
    • One where the handle can deref itself
  • Dang it I'm failing at my task of making the API simple
programmerjake commented 1 year ago

I mentioned this in the latest ArrayVec RFC: https://github.com/rust-lang/rfcs/pull/3316#issuecomment-1247587175