Idea: Make the pointer a generic associated type

maniwani commented 1 year ago

There are lots of applications for custom allocation strategies.

Video games will employ custom allocators in many places to improve performance. One class of strategies used by games that is currently inhibited by std are ones that involve relative pointers.

Think about a classic video game emulator. It might allocate one big chunk of memory upfront to serve as the "system RAM" (e.g. Gameboy Advance had 384 KiB of RAM). The emulator can then sub-allocate from that chunk for collections like Vec<T>. If the pointer held by a Vec<T> was the index of a byte within the chunk rather than its absolute address, the emulator could memcpy the entire chunk to cheaply create a perfect snapshot/savestate. The move/copy would not invalidate the pointer.

Similar tools can already be found in the Rust ecosystem.

The rkyv crate is built from the same principle. Its serializer replaces pointers with "self-relative" pointers (and replaces structs with stable #[repr(C)] variants) so that you can later load the serialized blob from the filesystem and read from it directly without having to deserialize it first (which is very expensive), similar to a memory-mapped file.

We could do similar things with general allocation, but the problem is the Allocator trait is locked to NonNull, so it can't be used to return non-standard pointers. Likewise, all containers in std are hardcoded to hold a NonNull<T>/*mut T.

If possible, I'd like to see something to this effect.

pub unsafe trait Allocator {
    type Pointer<T: ?Sized>;

    fn convert_to_non_null_ptr(&self, ptr: Self::Pointer<u8>) -> NonNull<u8>;

    fn allocate(&self, layout: Layout) -> Result<Self::Pointer<[u8]>, AllocError>;
    unsafe fn deallocate(&self, ptr: Self::Pointer<u8>, layout: Layout);

    fn allocate_zeroed(
        &self,
        layout: Layout
    ) -> Result<Self::Pointer<[u8]>, AllocError> { ... }
    unsafe fn grow(
        &self,
        ptr: Self::Pointer<u8>,
        old_layout: Layout,
        new_layout: Layout
    ) -> Result<Self::Pointer<[u8]>, AllocError> { ... }
    unsafe fn grow_zeroed(
        &self,
        ptr: Self::Pointer<u8>,
        old_layout: Layout,
        new_layout: Layout
    ) -> Result<Self::Pointer<[u8]>, AllocError> { ... }
    unsafe fn shrink(
        &self,
        ptr: Self::Pointer<u8>,
        old_layout: Layout,
        new_layout: Layout
    ) -> Result<Self::Pointer<[u8]>, AllocError> { ... }
    fn by_ref(&self) -> &Self
    where
        Self: Sized,
    { ... }
}

i.e.

unsafe impl Allocator for Global {
    type Pointer<T: ?Sized> = NonNull<T>;

    fn convert_to_non_null_ptr(&self, ptr: Self::Pointer<u8>) -> NonNull<u8> {
        ptr
    }

    ...
}

Then the collections from std could also be made generic and store A::Pointer.

So a few questions:

Now that GATs are stable, are there any remaining issues (e.g. with provenance/Miri) that wouldn't allow doing this?
How much does this overlap with #93?

zakarumych commented 1 year ago

Typical relative pointers store address offset relative to the pointer's address. This allows cheap calculation of real pointer, but disallows moving the pointer.

Therefore it is not reasonable to return such pointer from allocator. Move invalidates it.

Instead you can use allocator to fetch real pointer and then construct relative pointer, knowing relative pointer's address in advance.

maniwani commented 1 year ago

Typical relative pointers store address offset relative to the pointer's address. This allows cheap calculation of real pointer, but disallows moving the pointer.

A move will only invalidate the pointer if its offset is relative to where the pointer itself is stored. But that isn't the only type of relative pointer that exists.

I can write an arena allocator that sub-allocates from a contiguous chunk of memory (acquired from the global allocator), where allocate returns the index of a byte in the chunk as its pointer (i.e. the pointer is relative to the chunk). That relative pointer is not invalidated by moving it. The entire chunk can be moved without invalidating it.

Therefore it is not reasonable to return such pointer from allocator. Move invalidates it.

That issue does not apply to all kinds of relative pointer, but you've brought up a good point about move semantics. We can address that by requiring the associated Pointer type to implement another, unsafe trait. That trait will be a promise that moving the pointers around does not invalidate them.

Instead you can use allocator to fetch real pointer and then construct relative pointer, knowing relative pointer's address in advance.

This doesn't accomplish anything.

I want to be able to store a Box<T> in my arena and move that whole arena around without invalidating its pointer, but that's impossible unless I either (1) write my own fork of Box<T> or (2) std::boxed::Box<T> is changed to hold an A::Pointer.

I've already written that arena allocator, but I can't use it how I want with std containers. They're incompatible because they've been hardcoded to store a *mut T, and that doesn't seem necessary anymore.

What would prevent us from doing the following?

Put every property *mut T has to satisfy under an unsafe trait Pointer contract.
Add an associated Pointer type to the Allocator trait, required to implement the Pointer trait.
Change types in std to store <A as Allocator>::Pointer<T> instead of *mut T.

dead-claudia commented 1 year ago

A move will only invalidate the pointer if its offset is relative to where the pointer itself is stored. But that isn't the only type of relative pointer that exists.

I can write an arena allocator that sub-allocates from a contiguous chunk of memory (acquired from the global allocator), where allocate returns the index of a byte in the chunk as its pointer (i.e. the pointer is relative to the chunk). That relative pointer is not invalidated by moving it. The entire chunk can be moved without invalidating it.

It's unsafe to read or write during that move without pinning the pointer, however. Not sure you can properly model this via the allocator trait as-designed.

I could see this combined with a fn pin(&self, ptr: Self::Pointer, layout: Layout) -> *mut u8 method and related unpin method, that for most allocators would be no-ops.

maniwani commented 1 year ago

It's unsafe to read or write during that move without pinning the pointer, however.

What I meant to say was that the byte index (the offset, what the MyArena::Pointer<T> would be) wouldn't change if you just move the arena around. But true, if something was holding a *mut T pointing to the old memory, a move would invalidate it.

(edit: This is why allocator-agnostic containers would need to store <A as Allocator>::Pointer<T> instead of *mut T, so that their internal operations always go through A::convert_to_non_null_ptr, which would be a no-op for most allocators.)

Not sure you can properly model this via the allocator trait as-designed.

Isn't it fine as long as actions like "moving the arena" require unsafe? (Would there need to be a way to enforce that or is it enough that Allocator is an unsafe trait to implement?)

dead-claudia commented 1 year ago

It's unsafe to read or write during that move without pinning the pointer, however.

What I meant to say was that the byte index (the offset, what the MyArena::Pointer<T> would be) wouldn't change if you just move the arena around. But true, if something was holding a *mut T pointing to the old memory, a move would invalidate it.

Even a *const T could get invalidated by the move, since it occurs transparent to the program.

(edit: This is why allocator-agnostic containers would need to store <A as Allocator>::Pointer<T> instead of *mut T, so that their internal operations always go through A::convert_to_non_null_ptr, which would be a no-op for most allocators.)

Not sufficient. You have to notify the allocator it's safe to move again as well. Unfortunately, this ends up functioning more like a mutex lock than a mere reference box, so it'd need some more implicit stuff.

Not sure you can properly model this via the allocator trait as-designed.

Isn't it fine as long as actions like "moving the arena" require unsafe? (Would there need to be a way to enforce that or is it enough that Allocator is an unsafe trait to implement?)

In theory, yes, due to unsafe basically meaning "you're on your own with regards to memory safety". In practice, race conditions would be very easy to run into without explicit locking or some other form of move obstruction.

maniwani commented 1 year ago

*const T

I've just been using *mut T as a catch-all for a traditional pointer. *const T or *mut T doesn't make a difference when the focus is on the address.

Not sufficient. You have to notify the allocator it's safe to move again as well. Unfortunately, this ends up functioning more like a mutex lock than a mere reference box, so it'd need some more implicit stuff.

In theory, yes, due to unsafe basically meaning "you're on your own with regards to memory safety". In practice, race conditions would be very easy to run into without explicit locking or some other form of move obstruction.

What are the other implicit invariants?

If it's valid with unsafe and the responsibility is on the caller performing the move, is there a problem?

I've written an allocator like I've described, and I know I could use it in specific scenarios[^1]. My issue is I can't allocate collections from std with it, because all relevant struct and trait definitions are tied to a specific family of pointer types (*const T, *mut T, NonNull<T>).

From what I see, my allocator would satisfy the safety invariants of Allocator if only Allocator expanded its definition of "pointer", so that's what I'd like to see happen. I'm OK with being "on my own with regards to memory safety" if I could even do so, but right now an allocator that doesn't return a NonNull<u8> can't be an Allocator.

[^1]: Correct me if I'm misunderstanding your mention of race conditions, but I don't need thread-safety in my use case, and thread-safety isn't required by the Allocator trait.

dead-claudia commented 1 year ago

*const T

I've just been using *mut T as a catch-all for a traditional pointer. *const T or *mut T doesn't make a difference when the focus is on the address.

Not sufficient. You have to notify the allocator it's safe to move again as well. Unfortunately, this ends up functioning more like a mutex lock than a mere reference box, so it'd need some more implicit stuff.

In theory, yes, due to unsafe basically meaning "you're on your own with regards to memory safety". In practice, race conditions would be very easy to run into without explicit locking or some other form of move obstruction.

What are the other implicit invariants?

If it's valid with unsafe and the responsibility is on the caller performing the move, is there a problem?

I never said it failed to satisfy the proposed modified Allocator safety contract here. I just said it wouldn't be a practical implementation, due to other (related) risk.

And the move for such relatively-positioned memory is executed not by the caller, but by the allocator.

rust-lang / wg-allocators

Idea: Make the pointer a generic associated type #113