Coherent design of GlobalAlloc and Alloc

gnzlbg commented 5 years ago

Right now, Alloc and GlobalAlloc are similar, but not quite, and depending on how we evolve Alloc, they could diverge even more. This is not necessarily bad, but it should happen by design, and not by accident. In particular, for GlobalAlloc, how we currently do linking imposes some constraints on, e.g., whether the trait can have generic methods, etc.

So I think it is important to have at least some level on consensus on where do we want to go with these two traits, and how do they fit with each other, such that we get a cohesive design.

I'm opening this issue to track that, and will leave my opinion as a comment.

gnzlbg commented 5 years ago

I think that GlobalAlloc's goal was to unlock something that was impossible to do without it, and that was high priority, and therefore could not wait for all issues with Alloc to be resolved. That goal was met, and due to stability, we have to support that forever. That's actually ok, GlobalAlloc only has a few methods.

Now that we making progress with Alloc, it might be worth exploring whether Alloc can super-seed GlobalAlloc or not.

For example, right now one has to implement both traits for most allocators, which is inconvenient, and the type signatures aren't quite the same. It would be better to be able to avoid that in the future, if possible, because the amount of context that one has to keep in mind and document about these traits is quite large, and having to duplicate it, or even worse, diverge in subtle ways, would be bad for users.

Ideally, at some point, one would just implement Alloc for an allocator, and that allocator would be automatically usable as a #[global_allocator]. Whether we then want to deprecate GlobalAlloc or not, can be decided then (it's not necessary and we can't remove it, so might not be worth it).

For this to work we might need to keep Alloc "thin", e.g., it should have only one job, allocating, resizing, deallocating - layout computation happens somewhere else, no/little introspection, etc. and we might need to avoid generic methods like alloc_one/alloc_array/etc. But all of this seems doable.

Even if we are OK with always having two traits, and with these traits diverging, this isn't necessarily something that we want to commit to in an MVP for Alloc (e.g. saying that Alloc is forward-compatible with having only one trait in the future is one thing less that would need to be argued in an RFC).

I don't know how much it would cost to avoid being in a situation where Alloc becomes incompatible forever with GlobalAlloc, and maybe preserving that is not worth it, but this is one aspect of the design that we should consider when thinking about how to evolve Alloc (e.g. something that might be "nice to have" but not really necessary, might go from worth it to not worth it if it means not being able to have a single allocation trait in the future).

TimDiekmann commented 5 years ago

I don't expect that there will come a point where Alloc and GlobalAlloc will be compatible. Both traits have their right to exist, as both traits are supposed to solve different problems. Unlike Alloc, GlobalAlloc requires internal mutability (which makes both traits incompatible), otherwise a global variable would be mutable. This is a limitation that doesn't make sense with Alloc.

So the only thing implementors of both traits have in common is the ability to request and release memory. They cannot and should not be merged IMO.

SimonSapin commented 5 years ago

I think we have a couple of hard constraints:

GlobalAlloc is stable, so at first approximation it cannot be changed. (It can gain defaulted items.) Namely, we want to not break code that:
- Calls its methods
- Uses it in a where clause
- Implements it
- Uses #[global_allocator]
We want impl Alloc for Global which forwards calls to #[global_allocator]. Another way to express this is, we want impl<T: GlobalAllocator> Alloc for T to be possible to write with no other bounds. Or maybe impl<T: GlobalAllocator> Alloc for &'static T? (And maybe we should even consider having literally one of these impls?)

Beyond this, while it would be nice to avoid inconsistencies between the two traits (which are otherwise similar) other goals might take priority. For example, GlobalAlloc::alloc returns *mut u8 where null indicates failure, but I think the safety of Alloc::alloc returning a Result is preferably over consistency in this case.

I don’t understand what “an MVP for Alloc” means. We only get one shot at stabilizing any given set of APIs. (Though we can stabilize different subsets at different times.) And if you count Rust Nightly as minimally-viable, then MVP happened years ago.

gnzlbg commented 5 years ago

I don’t understand what “an MVP for Alloc” means. We only get one shot at stabilizing any given set of APIs.

AFAICT we can add new trait methods, associated types, consts, etc. in a backwards-compatible way by giving them defaults.

I don't think it's worth it to block Alloc stabilization on, e.g., figuring out all possible trait methods that we could ever want (e.g. all _aligned, _zeroed, _excess combinations). I'd rather stabilize only those that have actual use cases today, and add new ones later if new use cases are discovered. We don't even have to stabilize all collections simultaneously, e.g., we can stabilize Box<T, A> before Vec<T, A> or String<A>. Also, some in the tracking issue have expressed interest in having Alloc work with segmented memory, GPUs, NVMs, etc. If we block Alloc on figuring out all of this, we'll never ship anything.

So for me an MVP is the smallest possible stabilization we could make that delivers the most value, and is forward-compatible with adding more things later. This requires knowing in which direction things should go, as opposed to having all details figured out before stabilizing anything.

We want impl Alloc for Global which forwards calls to #[global_allocator]. Another way to express this is, we want impl Alloc for T to be possible to write with no other bounds. Or maybe impl Alloc for &'static T? (And maybe we should even consider having literally one of these impls?)

I am not sure how this would work. For example, jemalloc has a specialized grow_in_place method that's not part of GlobalAlloc, so the blanket impl of Alloc for all GlobalAllocs would add a "poor" default impl of grow_in_place. AFAICT, we would need stable specialization to be able to override this. Alternatively, GlobalAlloc could have a grow_in_place method that the blanket impl always calls, but this requires GlobalAlloc to have all / most Alloc methods.

I would prefer if, instead, we either added an impl<T: Alloc> GlobalAlloc for T, so that people only implement Alloc, and that's usable as a #[global_allocator], or alternatively, if both, types that implement Alloc or GlobalAlloc, are usable as #[global_allocator].

I don't know why we would want, by design, to have two 99% identical traits, with slightly different types in the APIs but otherwise semantically identical modulo documentation bugs, or design oversights or errors.

SimonSapin commented 5 years ago

Ok, so an actual blanket impl may not be desirable. But my point remains that we need impl Alloc for Global, and that imposes a limit to the requirements we can make of impls.

gnzlbg commented 5 years ago

We have Jemalloc in jemallocator which implements GlobalAlloc and Alloc.

When a user sets it as the #[global_allocator], there is a performance and usability gap between using the allocator via alloc::Global, and using it via the global allocator handle in the jemallocator crate (#[global_allocator] pub static Foo: Jemalloc;) directly.

The problem is that the implementation of Alloc for Jemalloc overrides most (all?) Alloc methods. For example, it overrides Alloc::usable_size(Layout) to actually return the usable size by calling nallocx, as opposed to just returning Layout::size() which is what the default implementation does. However, the implementation of Alloc for alloc::Global cannot do this, because these APIs are not part of GlobalAlloc. So alloc::Global ends up with a defaulted Alloc::usable_size (and most other methods, grow/shrink_in_place, _excess, etc.), and the default implementations of these methods don't add any value.

I don't think there is a way for users to override the impl of Alloc for alloc::Global, so the only reasonable solution I see here is for GlobalAlloc to expose 1:1 all the APIs that Alloc also exposes, so that the impl of Alloc for Global can forward to them.

Are there any other solutions to this problem?

I worry that we'd end up with two traits that are 99.999% identical, except for some subtle differences, and that most users need to end up having to know both traits well and watch out for their subtle differences, because most users want to make their Allocs usable as #[global_allocator].

gnzlbg commented 5 years ago

I also worry that some default methods exposed by Alloc and that can be overriden by the implementations, cannot ever be exposed correcty via alloc::Alloc. For example, it might not be possible to add the generic Alloc methods to GlobalAlloc.

SimonSapin commented 5 years ago

Independently of GlobalAlloc, avoiding generic methods could also help make Alloc object-safe. I haven’t thought a lot through the upsides and downsides, but something like Vec<T, Arc<Mutex<dyn Alloc>>> would be possible.

Back on topic, I’m starting to see your point. I’m still undecided on whether if would be worth the transition, but I think it would be possible to change the language rules from:

#[global_allocator] is used on a static item of type G with G: GlobalAlloc

To:

#[global_allocator] is used on a static item of type G with &'static G: Alloc

…and have a blanket impl<'a, G> Alloc for &'a G where G: GlobalAlloc, for compat with existing uses.

In this world you would implement only Alloc, with whatever behavior for usable_size and other defaulted methods.

The way #[global_allocator] works today is:

It expands to a set of pub extern fn function declarations with names like __rg_realloc that call
The std crate similarly defines pub extern fn functions with names like __rdl_realloc.
std::alloc::realloc a.k.a. alloc::alloc::realloc calls __rust_realloc which in the alloc crate is an extern "Rust" { … } declaration without a definition.
For a compilation unit other than rlib, the compiler decides on the kind of allocator symbols to inject:
- None if the alloc crate is not in the dependency graph,
- Otherwise Global if a crate with #[global_allocator] is in the dependency graph.
- Otherwise Default. This requires std to be in the dependency graph.
Based on this allocator kind, the compiler injects (or not) functions like __rust_realloc that simply forward to a call with the same signature, either to __rdl_realloc or __rg_realloc.

SimonSapin commented 5 years ago

(Err, I didn’t mean to send this just yet.)

The way #[global_allocator] works involves a finite set of symbols like __rust_realloc. So you’re correct that GlobalAlloc cannot have generic methods, and neither can Alloc if we wan to change #[global_allocator] to be based on Alloc instead as discussed above.

(The current implementation of that last injection step also uses LLVM’s API directly and so the signature of all of these symbols is limited to types that are primitive to LLVM. For example, callers of __rust_realloc decompose a Layout into a pair of usizes, and __r{g,dl}_realloc recompose it before calling GlobalAlloc::realloc. But in theory this is a less “fundamental” limitation.)

gnzlbg commented 5 years ago

and so the signature of all of these symbols is limited to types that are primitive to LLVM [...] But in theory this is a less “fundamental” limitation.

IIRC this was the main reason we don't use "fancier" types (e.g. NonNull, Option, Result, etc.) in GlobalAlloc methods (that would require us to do more complex transformations in this part of the compiler). This limitation does not look "fundamental" either to me, e.g., we could just lower here something like Option<NonNull<T>> to an i8* in LLVM-IR, but all of this needs to be done "manually" inside the compiler for each GlobalAlloc method.

#[global_allocator] is used on a static item of type G with &'static G: Alloc

Here, if GlobalAlloc provides 1:1 the API of Alloc, but with "simpler" types, maybe #[global_allocator] could automatically generate an impl GlobalAlloc for G that maps the fancier types of Alloc to the types that GlobalAlloc uses. Maybe with some magic, this impl could use unstable compiler features (even in stable), so that from a stability POV we don't really have to evolve GlobalAlloc further - we just keep it as is for backwards compatibility, and those who want to implement more methods should just upgrade to implementing Alloc instead (maybe with a deprecation of GlobalAlloc at some point when Alloc reaches feature parity).

That is, the GlobalAlloc trait would become more like an implementation detail that provides a "simpler" interface for the compiler and backward compat, while the Alloc trait becomes the only trait users need to know and implement.

For this to work, we need to make sure when designing Alloc that this can always be made to work (e.g. no generic methods), and also, that there is no reason for users to have to go one level down and implement GlobalAlloc directly (e.g. because the fancier API of Alloc adds costs over the GlobalAlloc one).

SimonSapin commented 5 years ago

IIRC this was the main reason we don't use "fancier" types

No, this was unrelated to implementation details: https://github.com/rust-lang/rust/pull/51160#issuecomment-392871051

we could just lower here something like Option<NonNull<T>> to an i8* in LLVM-IR

Yes, and we already have a mechanism for such conversions here, here, and here.

We technically can carry Result<NonNull<T>, AllocErr> through #[global_allocator] if AllocErr has a known size. In particular, this would be incompatible with AllocErr being an associated type. (Unless #[global_allocator] fixes it with something like &'static G: AllocHandle<Err=std::alloc::AllocErr>)

gnzlbg commented 5 years ago

In particular, this would be incompatible with AllocErr being an associated type.

I think we could also require conversion from/to Alloc::Err and std::alloc::AllocErr for this to work, e.g.,

trait Alloc {
    type Err: From<std::alloc::AllocErr> = std::alloc::AllocErr;
}

or similar bounds, and have the compiler invoke the conversions in the shim, but this essentially means that one can only propagate as much information as alloc::AllocErr does, because when using alloc::Global the errors would always be converted twice. This would work for type Err = ! (having the desired effect), and for type Err = std::alloc::AllocErr it should be zero-cost. But this is getting too complicated for my taste.

SimonSapin commented 5 years ago

I think that wouldn’t work. If AllocErr is an associated type, <Global as AllocHandle>::AllocErr is one specific type liballoc and cannot depend on #[global_allocator] in an unrelated crate.

TimDiekmann commented 4 years ago

Do we already have a concrete proposal at this time? Taking into consideration the latest progress I guess we want

impl<T: GlobalAlloc> AllocRef for T somehow
Use AllocRef in #[global_allocator]
Get rid of GlobalAlloc

TimDiekmann commented 4 years ago

I think many concerns regarding this has been resolved as we decided not to have an associated error type and drop the usable_size API.

TimDiekmann commented 4 years ago

I discovered, that we cannot implement AllocRef on System when using an implementation like impl<T: GlobalAlloc> AllocRef for T without specialization. Maybe it is possible to apply #[global_alloc] on both, AllocRef and GlobalAlloc? And if both implementations are provided, AllocRef is used.

Lokathor commented 4 years ago

i wouldn't allow two implementations and pick a favorite if you have both.

Rather, just say that #[global_alloc] now works on one additional type of thing, and you still get exactly one global alloc declaration within a single build.

Wodann commented 4 years ago

I think the main goal is to provide backwards compatibility with a stable type (GlobalAlloc), while slowly introducing its replacement AllocRef. Like @Lokathor said, we can just provide two implementation, and add a comment to GlobalAlloc nightly docs that it will be superceded by AllocRef. That then transitions from a deprecation warning to error as we stabilize and release several more Rust versions.

TimDiekmann commented 4 years ago

As impl<A: GlobalAlloc> AllocRef for A is working even with AllocRef taking &mut self, should I push this upstream?

Amanieu commented 4 years ago

I feel we should first come to a consensus on what we want to do about #[global_allocator].

TimDiekmann commented 4 years ago

Is this related to #[global_allocator] directly? If we slowly want to move from GlobalAlloc to AllocRef and a blanket implementation is possible, I think we should provide it.

SimonSapin commented 4 years ago

I discovered, that we cannot implement AllocRef on System when using an implementation like impl<T: GlobalAlloc> AllocRef for T without specialization.

Why not?

Maybe it is possible to apply #[global_alloc] on both, AllocRef and GlobalAlloc? And if both implementations are provided, AllocRef is used.

That is not possible. #[global_allocator] is just a macro that emits Rust code, its only input is the local Rust source of the definition of a static item. It cannot emit different code based on whether a trait is implemented by a type.

Compiling with rustc +nightly -Z unstable-options --pretty=expanded | rustfmt, this:

#[global_allocator]
static ALLOCATOR: std::alloc::System = std::alloc::System;

becomes:

static ALLOCATOR: std::alloc::System = std::alloc::System;
const _: () = {
    #[rustc_std_internal_symbol]
    unsafe fn __rg_alloc(arg0: usize, arg1: usize) -> *mut u8 {
        ::core::alloc::GlobalAlloc::alloc(
            &ALLOCATOR,
            ::core::alloc::Layout::from_size_align_unchecked(arg0, arg1),
        ) as *mut u8
    }
    #[rustc_std_internal_symbol]
    unsafe fn __rg_dealloc(arg0: *mut u8, arg1: usize, arg2: usize) -> () {
        ::core::alloc::GlobalAlloc::dealloc(
            &ALLOCATOR,
            arg0 as *mut u8,
            ::core::alloc::Layout::from_size_align_unchecked(arg1, arg2),
        )
    }
    #[rustc_std_internal_symbol]
    unsafe fn __rg_realloc(arg0: *mut u8, arg1: usize, arg2: usize, arg3: usize) -> *mut u8 {
        ::core::alloc::GlobalAlloc::realloc(
            &ALLOCATOR,
            arg0 as *mut u8,
            ::core::alloc::Layout::from_size_align_unchecked(arg1, arg2),
            arg3,
        ) as *mut u8
    }
    #[rustc_std_internal_symbol]
    unsafe fn __rg_alloc_zeroed(arg0: usize, arg1: usize) -> *mut u8 {
        ::core::alloc::GlobalAlloc::alloc_zeroed(
            &ALLOCATOR,
            ::core::alloc::Layout::from_size_align_unchecked(arg0, arg1),
        ) as *mut u8
    }
};

SimonSapin commented 4 years ago

I discovered, that we cannot implement AllocRef on System when using an implementation like impl<T: GlobalAlloc> AllocRef for T without specialization.

Why not?

Ok, I tried it and got errors like cannot specialize default item `alloc`. The fix for that is not specialization but to remove unsafe impl AllocRef for System in libstd since it becomes a special case of the new, more general unsafe impl<A: GlobalAlloc> AllocRef for A in libcore.

rust-lang / wg-allocators

Coherent design of GlobalAlloc and Alloc #21