`Box<T, A: BuildAlloc>`

scottjmaddox commented 5 years ago

The current push to associate an allocator to boxes (among other types) is targeting something like Box<T, A: Alloc>. This limits the design space of custom allocators, though, because it forces one of two approaches for accessing the Alloc:

Use a single static global state (or pointer or reference to the state).
Embed a pointer or reference to the state inside every box.

There are many kinds of allocators that these limitations rule out. For example, perhaps you want to have multiple independent "Arena" or "Region" allocators that each allocate from a dedicated slab of memory. Since you want to have multiple independent allocators, you cannot (efficiently) use approach 1. So you're forced to use approach 2, which is extremely inefficient, requiring a whole extra pointer to be stored in each Box.

One approach for avoiding this limitation would be something like the following:

pub struct Box<T: ?Sized, A: BuildAlloc>(Unique<T>, A);

pub trait BuildAlloc {
    type Alloc: Alloc;

    /// Build an `Alloc` given a `ptr` pointer to memory with the given
    /// `layout` that was allocated by the associated allocator.
    ///
    /// # Safety
    ///
    /// This function is unsafe because undefined behavior can result
    /// if the caller does not ensure all of the following:
    ///
    /// * `ptr` must denote a block of memory currently allocated via
    ///   this allocator,
    ///
    /// * `layout` must *fit* that block of memory,
    ///
    /// * In addition to fitting the block of memory `layout`, the
    ///   alignment of the `layout` must match the alignment used
    ///   to allocate that block of memory.
    unsafe fn build_alloc(&mut self, ptr: NonNull<u8>, layout: Layout) -> Self::Alloc;
}

Note that BuildAlloc provides the method build_alloc, which accepts &mut self and a pointer and layout for memory that was allocated by the associated allocator, and returns an Alloc implementation. This gives many options for locating the allocator's state and generating an Alloc. Approach 1 (global state) is of course still possible, and so is approach 2 (embed a pointer in every box). But this also makes approach 3 possible:

Calculate a pointer to the allocator state from a pointer to heap memory allocated by that allocator.

This makes many more clever and efficient kinds of allocators possible, such as an allocator that allocates all 8-byte allocations from aligned 1 MiB blocks, and stashes a pointer to its own state at the beginning of each of those 1 MiB blocks. Such an allocator can implement BuildAlloc to check the size of the allocated memory and if it is 8-bytes then mask off the lowest 10 bits from ptr to get a (1 MiB aligned) pointer to the allocator's state, from which it can construct an Alloc.

To clarify, BuildAlloc makes it possible to calculate a pointer to the allocator's state from the Box itself:

impl<T: ?Sized, A: BuildAlloc>  Box<T, A> {
    fn get_alloc(&self) -> A::Alloc {
        self.1.build_alloc(
            NonNull::from(self.0).cast::<u8>(),
            Layout::for_value(self.as_ref())
        )
    }
}

Since the Box provides a pointer to memory allocated by the allocator, the allocator can use its control of the pointers it returns and the memory around those pointers to efficiently locate its own state and generate an Alloc. This makes many more kinds of clever and efficient allocators possible.

Relevant comments from PR #58457: Associate an allocator to boxes:

https://github.com/rust-lang/rust/pull/58457#issuecomment-466681204
https://github.com/rust-lang/rust/pull/58457#issuecomment-466739043
https://github.com/rust-lang/rust/pull/58457#issuecomment-475089881
https://github.com/rust-lang/rust/pull/58457#issuecomment-482816662
- this comment has the most complete usage example, although it defines AllocFrom (old name of BuildAlloc) to have a method for each container type, when really we just need a single method that accepts a pointer to heap memory

Edited 2019-05-04: Rename AllocFrom to BuildAlloc for consistency with BuildHasher. Edited 2019-05-07: Adjust build_alloc signature to match dealloc.

TimDiekmann commented 5 years ago

So if I understand correctly, AllocFrom is related to Alloc just like BuildHasher is related to Hasher?

scottjmaddox commented 5 years ago

So if I understand correctly, AllocFrom is related to Alloc just like BuildHasher is related to Hasher?

Almost exactly. The method signature is different, though, because the whole point is to give the allocator access to a pointer to its own heap memory from which to generate its Alloc implementation.

Calling it BuildAlloc for consistency would be perfectly reasonable, though.

glandium commented 5 years ago

I do have this very use-case, actually. And it didn't strike me that it could be solved this way, and I must say I like it. But there are details to figure out. Like... how do you get the AllocFrom value you attach to the Box from the Alloc.alloc? Because Box::new_in would have to take an Alloc as input, and the AllocFrom would need to be created somehow.

(Note that last iteration in https://github.com/rust-lang/rust/pull/58457 doesn't have bounds on the Box type itself at all. They're all at the implementation level. Which actually gives some additional flexibility)

SimonSapin commented 5 years ago

Aren’t bounds on the type definition required to allow impl Drop to have the same bounds?

TimDiekmann commented 5 years ago

Calling it BuildAlloc for consistency would be perfectly reasonable, though.

This is exactly what I had in mind. Do you want to change the issue and the title to use BuildAlloc instead of AllocFrom?

scottjmaddox commented 5 years ago

how do you get the AllocFrom value you attach to the Box from the Alloc.alloc? Because Box::new_in would have to take an Alloc as input, and the AllocFrom would need to be created somehow.

Yes, the type signature for new_in would need to be something like this:

impl<T: ?Sized, A: BuildAlloc>  Box<T, A> {
    pub fn new_in(x: T, a: A::Alloc) -> Box<T, A>;
}

And, as you point out, you need some way to create a BuildAlloc from an Alloc. So a new method on Alloc will need to be added, something like get_build_alloc, and an associated type, type BuildAlloc: BuildAlloc.

I wasn't sure if such cyclic associated types are allowed, but it looks like they are.

Edit: fix new_in type sig (remove &self, add x: T).

glandium commented 5 years ago

Aren’t bounds on the type definition required to allow impl Drop to have the same bounds?

There's an interesting problem there... Drop currently does nothing. It's actually implemented by the compiler, which then goes on generating calls to box_free on its own.

glandium commented 5 years ago

So a new method on Alloc will need to be added, something like get_build_alloc

Are there use cases where different allocations could have a different BuildAlloc that would point back to the same Alloc? Those would be left out with this approach.

scottjmaddox commented 5 years ago

Are there use cases where different allocations could have a different BuildAlloc that would point back to the same Alloc? Those would be left out with this approach.

None that I can think of. That doesn't mean they don't exist, though. Allocators traditionally use a single global free function that accepts a pointer to the memory to be freed and calculates a pointer to the allocator from that (or from global static state). So all of the history of allocator development has assumed the equivalent of a single BuildAlloc per allocator.

Ultimately, you can put whatever logic you want in BuildAlloc. So it's more a question of can what you want to do be done more efficiently with multiple BuildAllocs per Alloc. And the answer is probably "maybe". But if that's the only way to support BuildAlloc with the current type system, then it's still a huge improvement over the limitations inherent to Box<T, A: Alloc>. Don't let perfect be the enemy of good, and whatnot.

TimDiekmann commented 5 years ago

How could BuildAlloc play together with #9?

gnzlbg commented 5 years ago

Since you want to have multiple independent allocators, you cannot (efficiently) use approach 1.

I'm not sure why this is the case. If you put whatever you need in a static, or in a thread local, you can just have a ZST as an allocator, whose methods access it.

scott-maddox commented 5 years ago

Since you want to have multiple independent allocators, you cannot (efficiently) use approach 1.

I'm not sure why this is the case. If you put whatever you need in a static, or in a thread local, you can just have a ZST as an allocator, whose methods access it.

But they're not really multiple independent allocators then, are they? They're one allocator that pretends to be multiple allocators. And there's an overhead associated with that. If the allocators are truly separate, then you could, for example, safely make them !Send and !Sync and do away with all thread synchronization overhead (and avoid thread local storage overhead).

gnzlbg commented 5 years ago

But they're not really multiple independent allocators then, are they?

This is what I meant:

struct A; // my allocator type
fn foo() {
    static A0: A = A; // allocator 0
    static A1: A = A; // allocator 1
    #[derive(alloc_handle(A0))] struct AH0; // ZST handle to 0
    // for exposition only, the derive expands to this:
    impl Alloc for AH0 {
        fn alloc(...) -> ... { A0.alloc(...) }
        // ...
    }
    #[derive(alloc_handle(A1))] struct AH1; // ZST handle to 1
    let b0: Box<T, AH0>; // box using A0
    let b0: Box<T, AH0>; // another box using A0
    let b1: Box<T, AH1>;  // box using A1
    // ^^ all boxes are pointer sized cause the handle is a ZST
}

Whether you can have multiple handles to the same allocator or not, depends on your allocator impl. Whether you want to have one or multiple allocators, with different handles, is kind of up to you.

If you can put your allocators in statics, or thread locals, then you can have zero-sized handles to them. If you cannot, then your handle needs to have some state to find the allocator, but I don't see any way around that.

SimonSapin commented 5 years ago

@gnzlbg This uses different Rust types for different handles, so there’s a bounded number of such types determined at compile-time.

This is not what people mean by “multiple instances” of an allocator. For example:

fn library_entry_point(some_input: Foo) {
    let handle = ArenaAllocator::new();
    // …
    let b = Box::new_in(something, handle.clone());
    let v = Vec::new_in(handle.clone());
    // …
}

A program that uses this library could spawn a large number of threads that each have their own separate arena. At compile time you don’t how many threads there’s gonna be, so you can have a Rust type for each of them.

For v to be able to tell which of these arenas it belongs to, handle cannot be zero-size. Or, conversely, if the handle is zero-size then the allocator must have global state that is shared with all handles of the same type.

gnzlbg commented 5 years ago

@SimonSapin makes sense, I was misunderstanding which problem this solves, BuildAlloc feels useful.

All Alloc methods that need to identify an allocation take a pointer to an allocation (e.g. a NonNull<u8>) and its Layout. Why is BuildAlloc::build_alloc:

fn build_alloc<T: ?Sized>(&self, heap_ptr: *const T) -> Self::Alloc;
// nitpick: heap_ptr does not necessarily point to the heap

generic over T instead of

unsafe fn build_alloc(&self, ptr: *const u8, layout: Layout) -> Self::Alloc;

? How can the proposed API be safe ?

The OP mentions:

Such an allocator can implement BuildAlloc to check the size of the allocated memory using mem::size_of()

but Layout::size() would do the same, and just because somebody passes build_alloc a pointer to T does not mean that the allocation was performed with a Layout that matches the size_of::<T>, or even with the same allocator for that matter. Assuming that would be unsound, so this API probably also needs to be unsafe.

Counter example, Vec<T> could pass a ptr: *const T to build_alloc, but mem::size_of::<T> has likely nothing to do with the Layout that Vec<T> used to perform its allocation. For this API to be used correctly there, Vec<T> would need to construct a new type internally, that has the same size as the dynamic allocation it refers to. The only way I can imagine this could be done, is with a gigant match expression with isize::MAX match arms, each of which need to instantiate a different type. With today's toolchain this is impossible.

scottjmaddox commented 5 years ago

@gnzlbg You're right, it should be unsafe fn build_alloc(&self, ptr: NonNull<u8>, layout: Layout) -> Self::Alloc;, in order to match Dealloc. I'll update the OP.

The reason I had proposed making the API safe is because the implementer must agree to ensure it is safe. For example, the implementer must not de-reference the heap_ptr (unless it somehow knows it can do so safely). However, we need to pass the Layout separately to support all possible allocations, and thus the interface has the same potential for misuse as Dealloc and thus must be marked unsafe.

scottjmaddox commented 5 years ago

I also can't think of any reason build_alloc shouldn't take &mut self. You are guaranteed to have a &mut self during dealloc and realloc. If you are cloning a &Box, you are guaranteed to be able to clone the BuildAlloc which you could then get a &mut self to.

There might be some use cases where this could come in handy; maybe tracking the number of reallocations within the BuildAlloc?

So I'll change that as well.

TimDiekmann commented 4 years ago

@scottjmaddox You may take a look at https://github.com/TimDiekmann/alloc-wg? I have used BuildAlloc as parameter for Box and RawVec. Is this the way you had in mind? ZST allocators like Global and System implements both, the builder and the allocator.

Ekleog commented 4 years ago

I've just come across this issue. As it suggests adding a new function that makes it possible to have better control over how exactly the allocation is performed, would it be possible to take advantage of this method addition to make the method fallible, so that it's also possible to control how to behave when the allocation fails? (One example use case would be using an arena allocator for some untrusted data coming from the network, and wanting to abort just this specific client's instance if something OOMs here)

SimonSapin commented 4 years ago

Fallibility is a separate "axis" from using a provided allocator v.s. the global one. All four combinations are potentially useful.

Ekleog commented 4 years ago

They are, but only the version that takes both an allocator and is fallible is required to be implement all the other ones: it's easy to do Box::new_in(t, alloc).unwrap_oom() or Box::new_in(t, std::alloc::GLOBAL_ALLOCATOR) (assuming here a function like unwrap_oom that'd do the proper handle_alloc_error on Result<T, AllocErr>, and a global variable that allows read-only access to the global allocator).

Now, if you want to make all four versions it's great too :) what I want to point out is just that if you plan on adding a single function for the time being, then it'd probably be better if it were the fallible one. (Also, I'm not sure we want to encourage the whole allocating ecosystem to provide 4 functions for each use case by setting that example in libstd)

SimonSapin commented 4 years ago

I should have mentioned the RFC https://rust-lang.github.io/rfcs/2116-alloc-me-maybe.html and its tracking issue https://github.com/rust-lang/rust/issues/48043 for try_reserve on collections both have discussion of what the APIs should look like, for both this minimal set of methods and a future, more comprehensive fallible methods (try_push, try_extend, …)

Yes Box::try_new_in is the most basic/powerful combination that others can be defined on top of, and trying to reduce multiplicative explosion of the number of methods is a good point, but exposing it still involves making API design decisions.

TimDiekmann commented 4 years ago

I like to resolve the issues which suggest a change to AllocRef before I start implementing it for collections. I have played around with BuildAllocRef a lot some time ago and I liked the concept in general. However, for most use cases, it's probably overkill. On the other hand, the ordinary Rust user will not implement many allocators himself, so this should not be such a big problem.

To go further with this proposal, I think we need to work out an interface where each signature should be fully defined. Currently, it's a broad idea, of where to build AllocRef and where to pass BuildAllocRef. For testing purposes we either could use a branch on the alloc-wg crate or I try to keep things updated upstream.

I'll throw in some points here:

We probably need an associative type on both AllocRef and BuildAllocRef.
For collections we want the generic parameter to be A: AllocRef instead of B: BuildAllocRef, as type inference would not work otherwise (see https://github.com/TimDiekmann/alloc-wg/issues/5).
What do we store inside containers (BuildAllocRef vs AllocRef)? What to we want to pass to (e.g.) Vec::new_in? for BuildAllocRef:
- Is it efficient enough to build AllocRef every time we need it?
for AllocRef:
- How we get a BuildAllocRef from AllocRef? I think this is needed for cloning for example.

TimDiekmann commented 4 years ago

I have tested around a bit again with BuildAllocRef using more or less exactly the code from the OP. You can see the code in the buildalloc-test branch at alloc-wg. You have to clone it manually and create a doc to view it, as I haven't uploaded it somewhere yet.

Some notes on the implementation:

For alloc, there is no pointer. Even if you have a Layout available (the one you want to use for alloc), it's feeling unintuitive. The idea was, that we pass NonNull::dangling(), but a new method for creating an AllocRef may be useful. Maybe build_alloc_ref and get_alloc_ref, or build_alloc_ref and create_alloc_ref, or new_alloc_ref and build_alloc_ref?
I haven't tested around implementing BuildAllocRef for other types than ZSTs yet.
Every test passed without any changes
No modification to AllocRef was needed, which allows us to play around with it upstream easily!

Amanieu commented 4 years ago

I think we can defer BuildAlloc until later in a forward-compatible way. For now just define Box as Box<T, A: AllocRef>. Later we can extend it like this:

// All AllocRef implement BuildAllocRef
trait AllocRef: BuildAllocRef {}

// We provide a blanket implementation of BuildAllocRef for all existing AllocRef
impl<T: AllocRef> BuildAllocRef for T {
    // Just returns Self
}

// Relax the A bound to BuildAllocRef. This should be a backwards-compatible change.
struct Box<T, A: BuildAllocRef> {}

TimDiekmann commented 4 years ago

The blanket implementation is a great idea! As no modifications to AllocRef are needed as far as I have tested, this would work I guess.

Amanieu commented 3 years ago

One thing that I feel hasn't been fully addressed is how you can construct a Box<T, BuildAllocRef> where the BuildAllocRef is a ZST but the AllocRef isn't. This seems important as this is the main use case for having BuildAllocRef at all.

rust-lang / wg-allocators

`Box<T, A: BuildAlloc>` #12