Split `Allocator` trait

zakarumych commented 1 year ago

Currently there's only Allocator trait that provides both allocations and deallocations. And Box, Vec and other types has A: Allocator generic parameter.

However there are allocators with no-op deallocations and thus do not require collections and smart-pointers to keep any allocator state to deallocate. It would reduce size and improve performance somewhat significantly if Box<T, A> would be able to use ZST A parameter if no state is required for deallocation and allocation is not needed.

I propose the following solution:

Split Allocator trait into two - Deallocator and Allocator. They can be defined as following.

unsafe trait Deallocator {
fn deallocate(&self, ptr: NonNull<u8>, layout: Layout);
}

unsafe trait Allocator: Deallocator {
/* all other methods */
}

Define that deallocator deallocator: D created using <D as From<A>>::from(alloc) may deallocate memory allocated by alloc, any of its copy and equivalent allocators.
Leave only A: Deallocator bound on collections and smart-pointers and all their impl blocks where allocation is not performed.
Implement From<Box<T, A>> for Box<T, D> where D: From<A> and similar impls for other types with allocator type. This impl may conflict with others. The alternative is to add a method.

After this is done then allocators with possible no-op deallocation (like bumpalo::Bump or blink_alloc::BlinkAlloc) may define ZST deallocator type that does nothing on deallocation and only provides a lifetime to ensure that allocator is not reset.

On the bumpalo as example

struct Bumped<'a> {
  _marker: PhantomData<&'a Bump>,
}

unsafe impl<'a> Deallocator for Bumped<'a> {
  fn deallocate(&self, _ptr: NonNull<u8>, _layout: Layout) {}
}

impl<'a> From<&'a Bump> for Bumped<'a> {
  fn from(_bump: &'a Bump) -> Self {
    Bumped { _marker: PhantomData }
  }
}

// Usage

fn foo<'a>(bump: &'a Bump) {
  let box: Box<u32, &'a Bump> = Box::new_in(42, bump);
  let box: Box<u32, Bumped<'a>> = box.into();

  assert_eq!(size_of_val(&box), size_of::<usize>());

  // Following doesn't compile as cloning `Box` requires `A: Allocator`
  // let box2: Box<u32, _> = box.clone();
  // If such ability is required - do not downgrade allocator to deallocator.
}

zakarumych commented 1 year ago

The main motivation here is performance. Smart-pointers rarely require allocation after creation and containers are sometimes frozen. Making them have smaller footprint would reduce register-pressure and improve performance of generated code.

Not only bumpalo-like allocators may benefit from this change. Some stateful allocators may recover everything they need directly from pointer to memory block. Or at least require less state to perform deallocation.

Amanieu commented 1 year ago

See #9 for a long discussion precisely this issue.

My position is that:

In the case of Bump this isn't actually needed since you can just use Box::leak to get a &'a mut which is tied to the lifetime of the allocator.
This doesn't help Vec, HashMap, etc which are much more common use cases for custom allocators.

In the end I don't think the use cases justify the additional API complexity.

zakarumych commented 1 year ago

Box::<T>::leak will cause <T as Drop>::drop to be not called. That's kinda bad for anything with useful Drop impl.

And there are allocators that need to perform actual deserialization but restore state from pointer.

I agree that proposed change is not that useful for Vec and HashMap. When Vec is not longer need allocations it can be converted to Box<[T]> and again, with this change that box could be smaller.

I can't agree that custom allocators are used mostly with Vec and HashMap's but not Boxes. Bump-allocators - maybe

And note that bumpalo provides its own Box without allocator state and people will hesitate to move to std's Box without this change since if they pass those Boxes around they will see how performance degrades.

Amanieu commented 1 year ago

I still think this introduces a huge amount of API complexity. I'd recommend reading the full discussion in #9, it has examples like Box::clone which can only work if Box has a complete allocator. What then happens if you convert a Vec<T, A> to a Box<[T], A>? Is the resulting box clonable? Is a separate API needed to "downgrade" a Box<T, A> to a Box<T, D>? If an API is needed anyways, could we just keep theAllocator` trait the same and use an allocator that panics when trying to allocate?

zakarumych commented 1 year ago

What then happens if you convert a Vec<T, A> to a Box<[T], A>? Is the resulting box clonable?

Box is clonable if A: Allocator.

just keep the Allocator trait the same and use an allocator that panics when trying to allocate?

Making always panicking Allocator will increase WTF factor a lot. Collection types do not support conversion of allocator type without deconstruction. And some collection are not deconstructible.

I don't think Deallocator trait would increase complexity. Unless you know that you need it - just use Allocator. Authors of collection types may keep A: Allocator bound everywhere as they do right now.

Deallocator trait can be added without even changing Allocator trait like this:

unsafe trait Deallocator {
  unsafe fn deallocate(&self, ptr: NonNull<u8>, layout: Layout);
}

impl<A> Deallocator for A
where
  A: Allocator,
{
  #[inline(always)]
  unsafe fn deallocate(&self, ptr: NonNull<u8>, layout: Layout) {
    Allocator::deallocate(self, ptr, layout);
  }
}

Collection types may start using Deallocator where it makes sense. And allocator types may start support downgrading Allocator to Deallocator. Without breaking anyone's code

morrisonlevi commented 10 months ago

I don't think leaking is acceptable for arenas like Bumpalo, but I'm not sure the premise of the motivation is correct:

The main motivation here is performance. Smart-pointers rarely require allocation after creation and containers are sometimes frozen. Making them have smaller footprint would reduce register-pressure and improve performance of generated code.

I think based on the definitions, if the allocator is zero-sized just like Global, then you don't get a wider Box. Is this not true? If it isn't true, that's... kind of crazy that the committee/wg thinks that's okay.

But, I do understand that for some allocators, you need state for the allocation but not for the deallocation, so for these types, splitting them would be ideal so that things which only need dealloc can be smaller. I think that's maybe what they were actually implying, just under specified.

rust-lang / wg-allocators

Split `Allocator` trait #112