Contradicting documentation on default allocator

RalfJung commented 1 year ago

Currently the default global allocator is unspecified. Libraries, however, like cdylibs and staticlibs are guaranteed to use the System by default.

and it also states that

This type implements the GlobalAlloc trait and Rust programs by default work as if they had this definition:
use std::alloc::System;

#[global_allocator]
static A: System = System;

These two statements directly contradict each other -- one says that System is the default everywhere, one says that that is not specified. Which one should it be? :)

FWIW Miri currently considers it a bug to create an allocation with the default global allocator and then free it with System, matching the first bit of documentation, but contradicting the second.

thomcc commented 1 year ago

I think one solution here would be to decide that System is just always whatever we decide the default global allocator is (and perhaps it should be renamed to DefaultGlobal or something). The name System is already fairly misleading, given:

it is not valid to mix use of the backing system allocator with System, as this implementation may include extra work, such as to serve alignment requests greater than the alignment provided directly by the backing system allocator.

And also given that (IIUC) it's actually a vendored dlmalloc port on WebAssembly targets

RalfJung commented 1 year ago

So basically you are saying the second docs I quoted are right and the first one should be removed?

thomcc commented 1 year ago

That's my opinion, yes.

bjorn3 commented 1 year ago

I think we shouldn't assume #[global_allocator] static A: System = System; being the default in miri. If you depend on System being the global allocator you should add #[global_allocator] static A: System = System; yourself and if you don't add it, someone else is allowed to use a different global allocator, which would then cause UB for you. In that case you are at fault, not the crate that specified a different global allocator.

thomcc commented 1 year ago

Yeah, that's a pretty compelling point. While I do feel like System should be guaranteed the default global allocator (and probably renamed), it's very hard to rely on that without UB.

RalfJung commented 1 year ago

Miri runs binary crates, not lib crates, so it's not possible for someone else to set anything.

If the documentation says that Rust works as if this exists by default

#[global_allocator]
static A: System = System;

then we have to allow such code, we have no grounds for calling it UB. So @bjorn3 are you saying we should remove the 2nd documentation and keep the first?

bjorn3 commented 1 year ago

Miri runs binary crates, not lib crates, so it's not possible for someone else to set anything.

Miri is often used to test library crates.

So @bjorn3 are you saying we should remove the 2nd documentation and keep the first?

Yes.

RalfJung commented 1 year ago

Miri is often used to test library crates.

Yes -- by running a binary crate that exercises them (the test crate). What I said remains true.

Diggsey commented 1 year ago

@bjorn3 I disagree with this, because it means there's no way to "wrap" the default allocator.

I should be able to write something like this:

#[global_allocator]
static A: MyAllocationTracker = MyAllocationTracker::new(System);

And have it track my allocations without otherwise changing the behaviour of my program.

If you think System is not the right type for this (despite it being explicitly documented as the right way to do this) then we need a new DefaultGlobal type instead.

bjorn3 commented 1 year ago

And have it track my allocations without otherwise changing the behaviour of my program.

It shouldn't change the behavior to switch from the default allocator to the system allocator. At most the performance profile. If we decided that the System allocator is always the default allocator, then MyAllocationTracker::new(System) would not cause any perf regression other than inherent to MyAllocationTracker. If we decided to use a different default allocator, we did likely do so to improve performance. In that case using MyAllocationTracker::new(System) would be a perf regression, but not beyond what we would have anyway by deciding to only use System as default allocator. If we decide to only use System as default, maybe should rename System itself to DefaultGlobal and remove any relation to the actual system allocator?

chorman0773 commented 1 year ago

FTR, System is stable and cannot be renamed. If that path was taken, a new type entirely would have to be created.

Diggsey commented 1 year ago

It shouldn't change the behavior to switch from the default allocator to the system allocator.

In practice it can. We need a way to get precisely the default behaviour, not just "approximately". Whether we do that by upholding the guarantee that System is the default, or by introducing a new DefaultGlobal I don't really care,, but if we're going to do the latter, then the sooner the better.

chorman0773 commented 1 year ago

I'm unsure you could rely upon that. Making it generally unspecified would allow, for example, delegating to C++ operator new (which I contemplated doing for proc-macros back when lccc was written in C++ - it switched to just pulling in xlang_allocate via custom allocator).

On Sat, Nov 19, 2022 at 10:59 bjorn3 @.***> wrote:

And have it track my allocations without otherwise changing the behaviour of my program.

It shouldn't change the behavior to switch from the default allocator to the system allocator. At most the performance profile. If we decided that the System allocator is always the default allocator, then MyAllocationTracker::new(System) would not cause any perf regression other than inherent to MyAllocationTracker. If we decided to use a different default allocator, we did likely do so to improve performance. In that case using MyAllocationTracker::new(System) would be a perf regression, but not beyond what we would have anyway by deciding to only use System as default allocator. If we decide to only use System as default, maybe should rename System itself to DefaultGlobal and remove any relation to the actual system allocator?

— Reply to this email directly, view it on GitHub https://github.com/rust-lang/wg-allocators/issues/108#issuecomment-1320913823, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGLD2YHNOJVPG7JYKGHOR3WJD2NJANCNFSM6AAAAAASFHO4YI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

RalfJung commented 1 year ago

(despite it being explicitly documented as the right way to do this) t

It is also explicitly document as not being something you can rely on, so you really can't go off the docs here, unfortunately.

thomcc commented 1 year ago

FTR, System is stable and cannot be renamed. If that path was taken, a new type entirely would have to be created.

Ah, my bad. I thought this was covered by the same feature as Global. That's unfortunate but it could just be a change of documentation, TBH.

CAD97 commented 1 year ago

(Not saying it's necessarily a good idea, but) If we really want to change the name of a stable type, we can move the type and expose a (potentially deprecated) type alias and const from the old location.

For a unit struct like System, I believe these two items fully alias the functionality.

That said, I personally think that Global allocation (and the free functions) should not be interchangeable with whatever backs the global allocator. If we allow the AM to replace Global allocation with its own allocation, then properties provided by the concrete allocator (such as overallocation) aren't provided by Global when this happens.

RalfJung commented 1 year ago

You mean, even if the global allocator is explicitly set one cannot allocate via Global and then free via the underlying allocator?

Hm yeah that makes sense. We want to treat Global special for optimizations.

Diggsey commented 1 year ago

Just for the sake of considering all options, an alternative to this:

That said, I personally think that Global allocation (and the free functions) should not be interchangeable with whatever backs the global allocator.

Would be to say that the "special behaviour" (eg. that we can optimize away allocations) is a behaviour imparted by the #[global_allocator] attribute. WIth this approach the alloc/dealloc calls just call into the global allocator with no special behaviour, and could be interchanged, and direct use of the corresponding static variable would also benefit from these optimizations.

(To be clear this is completely tangential to my argument about having some way to access the default allocator)

bjorn3 commented 1 year ago

and direct use of the corresponding static variable would also benefit from these optimizations.

The special behavior comes from LLVM recognizing __rust_alloc and __rust_dealloc. If you directly call through the #[global_allocator] static, this won't trigger. You have to go through __rust_alloc/__rust_dealloc using std::alloc::Global.

Diggsey commented 1 year ago

@bjorn3 That's an implementation detail though. You could in theory export the methods on the #[global_allocator] as __rust_alloc and __rust_dealloc, making those special to LLVM and the global alloc/dealloc functions just normal functions.

RalfJung commented 1 year ago

That would mean the methods on a static of type T are not the methods from that type T. Seems rather strange to me?

RalfJung commented 1 year ago

In fact it's not even coherent...

#[global_allocator]
static S: MyAlloc = MyAlloc::new();

fn mk_alloc(s: &MyAlloc) {
  s.allocate(...);
}

fn main() {
  mk_alloc(&S);
}

mk_alloc obviously jumps to the user-written method on MyAlloc without any __rust_alloc magic. And hence when S is passed to it, we do call the global_allocator static, but it's not possible to route that call through __rust_alloc.

CAD97 commented 1 year ago

on allocation elision

Replaceable allocation is a tricky subject. The way to justify allocation elision in the face of an in-language allocator definition (ignoring the other hard parts around provenance et al) is typically something along the lines of the two part: - Calls to `Global as Allocator` can be serviced either by the `#[global_allocator]` provider or directly by the AM. Such replacement of the allocation provider does not need to satisfy any observable properties of the provided allocator implementation; it just needs to satisfy the `Allocator` interface. - Spurious (but sound per the `Allocator` interface) calls to the provided `#[global_allocator]` may occur, and need not have any relation to an allocation performed in the source. These two rules allow the optimizer to remove and insert allocations, respectively. It is expected that the optimizer has reasonable Quality Of Implementation and will not insert extra allocation for no reason. It would be a nice property to provide that any executed allocation actually is an allocation performed in the source, but this makes code motion more difficult.[^new] [^new]: E.g. it's a desirable optimization to be able to pull a `Box` with static size created and dropped in a loop to one created once outside of the loop and reused. Without allowing a spurious allocation, this becomes more difficult, as the optimization needs to account for the loop happening zero times. It may also be necessary to explicitly allow shifting allocator calls around respective to other code, but I believe this can be justified by combining the two rules to elide the written allocation and insert a new allocation where desired. This is another reason why providing a guarantee of all performed allocations corresponding to an allocation in the source is difficult. On top of this, it's not necessarily sufficient to make just `Global` and `System` removable (though that matches today's implementation); it is desirable that `Box>` would only record allocations which actually happen, and the tracing effect could be elided along with the actual allocation. Or a separate approach, it should ideally never change behavior to switch from `Box` to `Box`, where `MyAlloc` is the global allocator. This is AIUI a feature that C++ supports — `operator new` is always replaceable/elidable, even if it's a type-associated `operator new`. This means if you e.g. have some `LoggedNew` base class which defines an `operator new` which logs, `new`ing subclasses is still a fully removable operation (per the specification, anyway). In current Rust, you can't replicate this behavior. Instead, you have to either make your `LoggedAlloc` the `#[global_allocator]` and log every allocation event, or you can use it as `Box` and prevent full elision of any code using logged allocation (even if it would be elided with the global allocator) because the effect of logging is being treated as non-elidable. I don't have a good solution to offer here. `Global` and `System` providing the removable semantics but leaving other `Allocator` implementations as just plain in-AM code is simple, but leaves desirable allocation opportunity on the floor.[^prov] On the other hand, attaching the special removable semantics to `impl Allocator` can be very surprising (as it's allowing removal of calls' side effects) and makes testing such an interface difficult.[^prov2] [^prov]: It's also unclear how provenance is to be dealt with for normal `Allocator` implementations. [^prov2]: The same provenance questions exist for this choice as well, except now they're probably more opaque due to the insertion of the launder through AM semantics. If CHERI has a solution for removable allocation defined within CHERI-C, looking at it for inspiration is likely to be useful.

I think best resolution to the documentation issue is something along the lines of

Admit that the wasm targets are really scuffed. Some point in the (far) future, define a wasm32-canon target or similar using whatever the eventual stabilization of the wasm component model specifies.
Commit fully to the default allocator being System.
Define System as a minimalistic (but unspecified) wrapper around the host's allocation functions
- wasm-unknown-unknown's std providing a vendored allocator for System is a historical accident resulting from the practical compromises made to ship the Wasm MVP, similar to everything else in std which is stubbed out.
- The far-future wasm32-canon's System will utilize the portability lint to not be defined, unless the component model defines a standard provider for realloc. (Given this would cross component boundaries, this seems highly unlikely.) When targeting wasm32-canon, a #[global_allocator] must be defined.
Make it clearer how the default #[global_allocator] is injected into the root standalone crate types (i.e. bin, staticlib, cdylib) if not provided by library code.
If it is the case #[global_allocator] cannot be used with staticlib/cdylib, document that as such. (Them thus using the default System allocator can still be stated explicitly, but as a result from that restriction, not a fact of its own.)
Punt on the further properties/guarantees of #[global_allocator]; tracked in the wg-alloc repo.

chorman0773 commented 1 year ago

Make it clearer how the default #[global_allocator] is injected into the root standalone crate types (i.e. bin, staticlib, cdylib) if not provided by library code.

I would like for there to additional rules arround linking multiple standalone crates together. In particular, if you link a staticlib or cdylb to a bin or cdylib it should be unspecified whether global_allocator calls use the one provided in the downstream standalone crate (if any) or the allocator provided (additionaly, IFNDR if a staticlib explicitly defines a global_allocator and is linked into a bin that also explicitly defines a global_allocator) and provide the reverse rule for binary crates. This is independent of whether or not #[global_allocator] is allowed at all in staticlib/cdylib crates (I have no preference here, though I would lean against allowing it for staticlibs specifically).

As an alternative, saying that it must use the top-level binary's #[global_allocator] if at all specified is perfectly fine for me (as this is the function of the current implementation), though I suspect that rustc would have some issues with it.

I have need of this rule for lccc in order to be able to allow linking to staticlib crates from rust final link targets, or to link together multiple staticlib crates into a single final link target written in any language, as well as to permit overriding default allocator when dynamically linking std (except maybe on windows). The former is an absolute requirement for my use cases (standard system libraries written in rust) and the latter a QoL improvement since I want to dynamically link std by default on capable targets. This is because of how the allocator is implemented (a single weak symbol in a COMDAT group, referenced by liballoc and defined in libstd+#[global_allocator], the symbol is std::alloc::__global_allocator mangled).

Diggsey commented 1 year ago

@CAD97 I suppose one option would be for alloc crate (or core and move the trait) to expose its own generic implementation of the Allocator trait which (at least pretends to) implement the magic elision behaviour, eg.

struct OpaqueAllocator<T>(pub T);

impl<T: Allocator> Allocator for OpaqueAllocator<T> { ... }

Then it can simply be part of the contract of this type that it may not call the underlying allocator, or may call it multiple times, allowing the compiler to optimize allocations through this type. The System allocator can already be wrapped in this type, since it's not observable anyway.

CAD97 commented 1 year ago

@Diggsey core is already the crate that defines the Allocator and GlobalAlloc traits. The alloc trait adds the assumption of a #[global_allocator] and defines the safe types that use that global allocator. In the future with a stable Allocator trait, the alloc crate would be just System/Global and a bunch of type aliases to the collections defined in core which add a default parameter to provide A = Global.

Having a wrapper std::alloc::Removable<A> does seem like a reasonable way to model allocation elision with minimal issues, though it's still somewhat unfortunate to add more wrappers around what's very hot code, and without generic sharing, this leads to the easy way of doing things getting you monomorphized everywhere (because it's generic) rather than using a single instantiation. (You can still recover that property, but it requires defining two impl Allocator; one for the actual allocation work, and one just to wrap Removable<MyAllocator>.)

This would work for removing allocations, but the other code motion (adding, reordering) aren't possible to describe just with a wrapper selecting if it's going to call your impl or not. Perhaps we don't need these, but it should be noted that

@chorman0773 staticlib is defined to be a fully bundled static library with no dynamic dependencies (weak or otherwise) on other Rust code. If you want to link together multiple staticlib crates, the correct crate type is rlib or a custom lccc-staticlib, not staticlib. (Given we have dylib/cdylib, perhaps it would've been better if this crate type were cstaticlib.) Encapsulating whatever implementation strategy for #[global_allocator] inside the staticlib output should not be difficult; all you (should) need to do is not export its symbol, such that it only has internal linkage and is fully resolved within the staticlib.

There's open discussion around a staticlib-nobundle or stable (subset of) rlib such that its possible to link together static Rust libraries without duplicating symbols (ignoring monomorphization).

As for dynamic linking, similarly, cdylib is defined to be a bundle with no dependencies on other Rust code. If you want to do dynamic linking of std, the correct crate type for that is dylib (or a custom lccc-dylib), not cdylib.

Even if lccc wants to define dylib/rlib to be the same as cdylib/staticlib, there's still the bundling difference. And that bundling difference applies to all globals within the bundled build, whether statics or #[global_allocator]; multiple bundles linked together must have fully distinct worlds.

chorman0773 commented 1 year ago

staticlib is defined to be a fully bundled static library with no dynamic dependencies (weak or otherwise) on other Rust code. If you want to link together multiple staticlib crates, the correct crate type is rlib or a custom lccc-staticlib, not staticlib. (Given we have dylib/cdylib, perhaps it would've been better if this crate type were cstaticlib.)

The issue is that what I want to do is compile a system library (such as libcrypto) and then have that available to link into, including from rust code, via the standard interface expected of any libcrypto implementation (such as the one from openssl, libressl, etc.). The fact that if I do this I cannot link that version of the library to a rust crate means I do not comply with the standard definition of the library. Same for cdylib, I want to be able to produce system libraries using this type.

Encapsulating whatever implementation strategy for #[global_allocator] inside the staticlib output should not be difficult; all you (should) need to do is not export its symbol, such that it only has internal linkage and is fully resolved within the staticlib.

This is non-functional wrt. staticly linked libraries: There is no such thing as a symbol that can be linked by all objects of an archive and no other object in the same link step, at least not in ELF, COFF, or o65 objects (or OMF, a.out, xo65, and a number of others when using GNU or BSD archive format). Defining it as internal would limit it to being available only in the object file that defines it, and defining it as external or weak makes it available to all objects in the entire ld invocation. Even with 1 cgu, the allocator will be depended upon by upstream crates, so the definition must be external.

The issue with cdylib is that to support linking against a dynamically linked std, all providers of the alocator symbol are defined with default visibility as external. Additionally, to support MacOS and static linking of std, the symbol is defined as weak in a linkonce COMDAT group. The default allocator is just the std::alloc::__global_alloc definition inherited from libstd.

And that bundling difference applies to all globals within the bundled build, whether statics or #[global_allocator]; multiple bundles linked together must have fully distinct worlds.

Again, this is impossible on existing file formats using the standard archive for static libraries.

CAD97 commented 1 year ago

If the standard interface expected of any libcrypto implementation only has it importing symbols from libc, then you simply cannot match that with a Rust crate without entirely encapsulating the use of Rust such that no Rust symbols are imported or exported. You can't just add extra imported symbols (even if they're weak symbols) and claim that it provides the exact same interface.

You're making your own compiler. Make your own crate type which does what you need it to. This is far from the locale to argue about lccc trying to force existing compiler options to do things they were not designed to do.

chorman0773 commented 1 year ago

If the standard interface expected of any libcrypto implementation only has it importing symbols from libc, then you simply cannot match that with a Rust crate without entirely encapsulating the use of Rust such that no Rust symbols are imported or exported.

The standard interface is that it is usable. It's already assumed (the openssl-sys crate) that you can import your system libcrypto and libssl in rust.

You're making your own compiler. Make your own crate type which does what you need it to.

My assumption was that the cdylib and staticlib options, that is the options that are designed to produce system library types, produced files usable as system libraries. I was also implicitly assuming that I can write my rust code against the standard options expected of the rust compiler, and not have to limit a generally useful crate (lc-crypto) to being "lccc only", that seems highly counterproductive to the whole enterprise. I would further expect in general to be able to link any system library into either a rust program or a program that also links in other rust code, or to be able to combine system libraries, many of which are written in rust.

This is far from the locale to argue about lccc trying to force existing compiler options to do things they were not designed to do.

No - what I'm doing here is trying to ensure I have the proper guarantees to actually implement the language wrt. the global_allocator which is up for discussion here, while working arround a problem that prevents me from writing low-level system code in a systems programming language. And that I'm not given encapsilation requirements that are impossible to implement on existing file formats.

RalfJung commented 1 year ago

@chorman0773 @CAD97 you are veering widely off-topic for this issue; whether or not the allocator is shared when linking together crate type X is not what this thread is about.

Could someone with moderation power please mark these posts as off-topic?

RalfJung commented 1 year ago

This would work for removing allocations, but the other code motion (adding, reordering) aren't possible to describe just with a wrapper selecting if it's going to call your impl or not. Perhaps we don't need these, but it should be noted that

It could be a wrapper that applies the full set of allocation magic though -- that would be a language primitive, sure, but instead of saying that the magic elision/introduction of allocations is tied to the global allocator, we could tie it to that wrapper. That doesn't seem any harder to specify? And it would make the same magic available to custom allocators.

Pragmatically speaking, we'd add the appropriate LLVM attributes to calls of the functions on this magic wrapper.

thomcc commented 1 year ago

I quite like the suggestion of having Global be effectively be equivalent to SpookyAllocatorBikeshedMagic<System> -- specifying the existence of the SpookyAllocatorBikeshedMagic wrapper allows us to explain where that behavior will/won't occur in a clean and predictable way, rather than the status quo where the lack of it actually makes specifying the semantics around allocators hard in many cases.

bjorn3 commented 1 year ago

The far-future wasm32-canon's System will utilize the portability lint to not be defined, unless the component model defines a standard provider for realloc. (Given this would cross component boundaries, this seems highly unlikely.) When targeting wasm32-canon, a #[global_allocator] must be defined.

As I understand it the canonical abi of the component model requires exposing methods that allocate within the component itself. As such any component would either need #[global_allocator] or rust would need to provide a default allocator.

bjorn3 commented 1 year ago

Make it clearer how the default #[global_allocator] is injected into the root standalone crate types (i.e. bin, staticlib, cdylib) if not provided by library code.

I would like for there to additional rules arround linking multiple standalone crates together. In particular, if you link a staticlib or cdylb to a bin or cdylib it should be unspecified whether global_allocator calls use the one provided in the downstream standalone crate (if any) or the allocator provided (additionaly, IFNDR if a staticlib explicitly defines a global_allocator and is linked into a bin that also explicitly defines a global_allocator) and provide the reverse rule for binary crates. This is independent of whether or not #[global_allocator] is allowed at all in staticlib/cdylib crates (I have no preference here, though I would lean against allowing it for staticlibs specifically).

As an alternative, saying that it must use the top-level binary's #[global_allocator] if at all specified is perfectly fine for me (as this is the function of the current implementation), though I suspect that rustc would have some issues with it.

I have need of this rule for lccc in order to be able to allow linking to staticlib crates from rust final link targets, or to link together multiple staticlib crates into a single final link target written in any language, as well as to permit overriding default allocator when dynamically linking std (except maybe on windows). The former is an absolute requirement for my use cases (standard system libraries written in rust) and the latter a QoL improvement since I want to dynamically link std by default on capable targets. This is because of how the allocator is implemented (a single weak symbol in a COMDAT group, referenced by liballoc and defined in libstd+#[global_allocator], the symbol is std::alloc::__global_allocator mangled).

Cdylibs and staticlibs shouldn't leak anything about being a rust crate. That includes the global allocator. For cdylibs we indeed only export #[no_mangle] functions and don't import anything other than extern "C" { ... }. So #[global_allocator] doesn't leak through the cdylib boundary. For staticlib we unfortunately don't limit symbol exports at the moment, but we need to fix that anyway as it breaks including a rust staticlib in a rust program due to symbol conflicts. For dynamically linking libstd you must use the dylib crate type, not the cdylib crate type.

RalfJung commented 1 year ago

Please move the symbol leakage discussion to a different issue, will you? It is independent of what we guarantee for default crate types, which this issue is about. The only reason crate types show up in the OP is that I quoted the docs with context. But it's really a separate question. Looks like some people here are way too easily nerd-sniped... (I feel you, I often have the same problem. It's bad for discussions though.)

Who has moderation rights here and can collapse those off-topic posts?

bjorn3 commented 1 year ago

Opened https://github.com/rust-lang/rust/issues/104707.

rust-lang / wg-allocators

Contradicting documentation on default allocator #108