To what extent (if any) are allocator safety requirements relaxed by installing a *specific* global allocator?

thomcc commented 2 years ago

If a user installs a #[global_allocator] that has less strict safety requirements than those described in the documentation for the allocator, does this in turn relax what (e.g) alloc::alloc::dealloc considers UB?

For example, our allocators require accurate size/align to be provided when deallocating, but many allocators do not need this information. If a user has explicitly installed a specific global_allocator that does not care about size/align when deallocating, is it still UB for them to pass an incorrect layout to alloc::alloc::dealloc?

Naïvely, my suspicion is that the answer is "yes, it's still UB" (because the requirements of that function say so), but this seems... rather unfortunate, for various reasons[^1].

If it is UB, does that mean it's always UB to rely that a specific global_allocator is in use?

That is, it seems reasonable to me for a Rust library embedded in a larger C/C++ program to configure its #[global_allocator] to be the same one as used by the larger program, and then to rely on this by having one language free pointers allocated from another.

I've actually seen that done in several cases, as can both simplify code and improve performance[^2] (although perhaps it's somewhat unhygenic). So it would be nice to support (obviously only in the case where the user knows for certain that a specific global_allocator is in use), and I don't think there's any benefit we get from forbidding it.

[^1]: For example, maintaining this information can often cause require additional bookkeeping, and can prevent the use from reusing an allocation which would otherwise be allowed.

[^2]: For example, avoiding the need to allocate a fresh buffer and copy data into it.

Amanieu commented 2 years ago

I think it should be UB because the compiler handles #[global_allocator] specially for optimizations, which allows it to optimize some allocations to stack allocations instead for example.

However what you can do is call the global allocator directly, which is not UB since it doesn't go through the __rust_alloc symbols. Here is an example where I use dlmalloc-rs (with some modifications) as a #[global_allocator] and also expose it for use by C code:

#[no_mangle]
unsafe extern "C" fn malloc(size: usize) -> *mut u8 {
    GlobalDlmalloc.alloc(Layout::from_size_align_unchecked(size, 16))
}

#[no_mangle]
unsafe extern "C" fn realloc(ptr: *mut u8, size: usize) -> *mut u8 {
    if ptr.is_null() {
        let layout = Layout::from_size_align_unchecked(size, 16);
        GlobalDlmalloc.alloc(layout)
    } else {
        // dlmalloc doesn't actually look at the passed in layout
        let dummy_layout = Layout::from_size_align_unchecked(0, 16);
        GlobalDlmalloc.realloc(ptr, dummy_layout, size)
    }
}

#[no_mangle]
unsafe extern "C" fn free(ptr: *mut u8) {
    if !ptr.is_null() {
        // dlmalloc doesn't actually look at the passed in layout
        let dummy_layout = Layout::from_size_align_unchecked(0, 16);
        GlobalDlmalloc.dealloc(ptr, dummy_layout)
    }
}

CAD97 commented 2 years ago

[Apologies if parts of this are incoherent. I'm still adjusting to a new medication. I think I wrote something worthwhile in here...]

I think my personally desired answer to “what is guaranteed about #[global_allocator]” pending further thought/argument is that calls to the rust global allocation functions may be served by the #[global_allocator] or directly by the compiler (in order to allow allocation elision), and as such I think we require that

The allowed input of the global allocation functions is always that as specified by the interface. If you want to take advantage of the expanded acceptable input of the concrete global allocator, you need to call it specifically, not through the abstract interface.
The returned pointer is to a fresh “rust allocated object.”
The size of the “rust allocated object” (what determines what is inbounds) is as described[^7]. A directly serviced allocation request may be any sufficient size[^8].
The allocated memory's initialization is as described by the interface.[^6]

[^6]: I really don't like allowing initialization (yes/no, not the choice of initialized value) of memory to be defined by the allocator, because that easily leads to an “uninitialized = arbitrary” thought process (e.g. “the allocator returns a pointer to some system memory, which is zeroed or a previously written value”) which is false in the case of things like write-delayed page allocation. However, I am sympathetic to wanting to have a hardened allocator that e.g. always returns zeroed memory and forcing the compiler to treat freshly allocated memory as freeze rather than poison. Unfortunately, this does prevent some simple optimizations, such as converting conditional assignment to freshly allocated uninitialized heap memory into unconditional.

[^7]: IOW you cannot use excess capacity the #[global_allocator] is known to give you (unless you use the Allocator API and it tells you about it).

[^8]: Such that if you do check and take advantage of the Allocator API allowing an allocator to give you more space than requested, this does not prevent the elision of the allocation.

Allowing is probably just a matter of specifying a small set of operational possibilities:

A pointer allocated explicitly on the #[global_allocator] provider can be processed by the global allocator functions. The moment ownership is transferred into such, it follows the rules as if it originated there.

Two wants to poorly segue into the below:

It is possible to use an globally allocated pointer in such a way that it is statically known to have been allocated by the #[global_allocator] (thus e.g. allowing giving it directly to it and any extra output guarantees it gives).
- Passing to an unknown external function necessarily is sufficient, as it could do that thing.
We can potentially promise that properties of the address hold as-if allocated by the #[global_allocator] by making “observing the address” sufficient to statically know it was.

“If you check it was, it was; if you don't check, it might not've been.” Which now sounds kinda bad, but I already wrote the below and it has some good bits...

[The beginning of this post was originally quite different, but writing the below refined my understanding significantly enough that it was unacceptably inaccurate. Everything below is still interesting and at least somewhat relevant, so forgive the non sequitur.]

At an absolute minimum, exposing allocation addresses (in the strict-pointer sense) is sufficient to allow proving whether an allocation was done by a concrete allocator, as one exposed address range overlaps the other exposed address range. Caveats about provenance still apply, but pointer comparisons are still stable, so it's still at that address.

IMHO the main question here is basically: does observing the address of a Box/Global served allocation prevent the allocation from being served by the compiler rather than the #[global_allocator] (assuming the address observation is itself observable)?

Derivative: does there exist a way to observe overalignment of a pointer without observing the address for the purpose of eliding heap allocation?

Derivative: do safe operations between pointers to independently allocated objects such as pointer comparison observe the address sufficiently to prevent eliding heap allocation? (Perhaps this is a reason why such comparisons are UB in C/C++?)

Disclaimer: following is significantly more just my personal interpretation and opinion, and I am not making a formalized argument. Additionally, I do see the benefit in still being able to replace (not just omit) global allocations which have been .addr()d with compiler-controlled ones.

* Two very important caveats:

query(&ALLOC) cannot be used to determine if a given pointer was allocated on the allocator (arbitrary but consistent result)[^5]; the actual pointer must be an input to such a check.
ALLOC.ptr_min() <= ptr && ptr <= ALLOC.ptr_max() cannot[^9] be used to determine if a given pointer was allocated by the allocator; these pointers reference separate “rust allocated objects” / “allocation planes” and thus their ordering is arbitrary but consistent[^1].

[^5]: This is the linchpin in removing allocations: you must be able to actually not call the allocator, which obviously knows whether it serviced an allocation, making that observable. So you have to allow the compiler to service the allocation instead. The alternative of just saying “allocations are unobservable/pure” is frighteningly nonoperational and you still have to fight “I observed the allocation happened” (e.g. its address is

[^1]: Open question, but I believe the practical (and in-use) guarantee (with the standard ZSTs get to lie exceptions) is that for valid pointers to non-deallocated “rust allocated objects” a) if ptr1 == ptr2 then they point to the “rust allocated object” (within the provenance span overlap) and b) if begin..end lies in a single “rust allocated object” and (begin..end).contains(ptr), ptr points into the same “rust allocated object.”

[^9]: ... with some further caveats: this would work if pointer comparison has a consistent transitive order[^10] and the allocator serves from a single continuous pool. It's tempting to say it would also work if this is the only allocator in use, but if the allocator manages multiple pools in separate “rust allocated objects,” the compiler can choose to sort one of its allocations between yours.

[^10]: Pointers are Ord, but Ord is not an unsafe trait, so this doesn't guarantee a total (nor even transitive) order. It's very implied and very unlikely to be made not true, but it's theoretically possible, and pointers are weird. I get deep into some of the weirdness in this comment. It also comes from making it possible to elide allocation.

The notable thing which can be used to determine if ptr was allocated by ALLOC is ptr.addr(). Proving two “rust allocated objects” overlap (this can only happen via the global alloc boundary which launders concrete pointers into new “rust allocated objects”) requires forcing both objects to exist on the same “allocation plane.”

There effectively exists an “allocation plane” for each “rust allocated object” (e.g. miri's AllocId). When comparing pointers between separate “planes,” you get the order between planes, which is arbitrary but consistent. When you observe[^2] the address of an allocation, this forces the allocation to exist on a shared “concrete” allocation plane, and separate allocations live on the same plane must not overlap (have distinct addresses). Each allocation exists on a single plane only; when I say an operation "concretizes" an address/allocation, this applies retroactively.

[^2]: Definitions vary. Obviously I consider it to be calling ptr.addr() (and any other pointer ops that involve getting a numeric comparison between two addresses not required to be in the same plane[^3], e.g. [wrapping_offset_from]).

[^3]: This is an ultra subtle detail I'm not completely sure about. Using non-inbounds pointer arithmetic is a known pessimisation, sure, but if the two pointers are in the same object, producing the answer does not strictly require[^4] concretizing the object address. The trick of the allocation plane scheme is that it should be impossible to ever tell if an object allocation has been concretized — this property is what prevents it from preventing optimizations. And yet, here I am making this observable, which means concretization cannot be omitted / an allocation cannot be removed.

[^4]: And in fact requiring it is potentially bad for Miri, because the retroactive behavior is not operational. Miri is forced to have all pointers be concrete in a single allocation plane, because it can't rewind time

The purpose of this scheme is that separate “rust allocated objects” can have overlapping addresses if the address is not observed. This is what allows us to remove any (even stack) allocations in the first place in the face of finite memory (since memory exhaustion is an observable event via allocation failure) without making code authors deal with the possibility of seeing overlapping allocations. (You can still indirectly observe this, of course.)

My main observation here is that this scheme also allows us to describe how the global allocation boundary functions when it creates a new “rust allocated object” (w.r.t. properties of the address).

This could potentially additionally be used to describe the behavior of .expose_addr() (w.r.t. addresses only, not provenance, unfortunately, which still needs angelic nondeterminism) as further concretizing the address onto an exposed address plane. However, given the full angelic non-det provenance model is necessary (see: the strict pointer PNVI discussion), describing it additionally via allocation planes seems to add unnecessary complications rather than offer any simplifications.

rust-lang / wg-allocators

To what extent (if any) are allocator safety requirements relaxed by installing a specific global allocator? #99

rust-lang / wg-allocators

To what extent (if any) are allocator safety requirements relaxed by installing a *specific* global allocator? #99

To what extent (if any) are allocator safety requirements relaxed by installing a specific global allocator? #99