"Memory block" dangerously underspecified with zero-sized allocations

rust-lang / wg-allocators

Home of the Allocators working group: Paving a path for a standard set of allocator traits to be used in collections!

http://bit.ly/hello-wg-allocators

203 stars 9 forks source link

"Memory block" dangerously underspecified with zero-sized allocations #82

Open mahkoh opened 3 years ago

mahkoh commented 3 years ago

The documentation talks about memory blocks in several places.

/// ### Currently allocated memory
///
/// Some of the methods require that a memory block be *currently allocated* via an allocator. This
/// means that:
///
/// * the starting address for that memory block was previously returned by [`allocate`], [`grow`], or
///   [`shrink`], and
///
/// * the memory block has not been subsequently deallocated, where blocks are either deallocated
///   directly by being passed to [`deallocate`] or were changed by being passed to [`grow`] or
///   [`shrink`] that returns `Ok`. If `grow` or `shrink` have returned `Err`, the passed pointer
///   remains valid.

and later

/// ### Memory fitting
///
/// * The provided [`layout.size()`] must fall in the range `min ..= max`, where:
///   - `min` is the size of the layout most recently used to allocate the block, and
///   - `max` is the latest actual size returned from [`allocate`], [`grow`], or [`shrink`].

and later

    /// Attempts to allocate a block of memory.
    ///
    /// On success, returns a [`NonNull<[u8]>`][NonNull] meeting the size and alignment guarantees of `layout`.

and later

    /// Deallocates the memory referenced by `ptr`.
    ///
    /// # Safety
    ///
    /// * `ptr` must denote a block of memory [*currently allocated*] via this allocator, and
    /// * `layout` must [*fit*] that block of memory.

Combined, all of these require that pointers returned by an Allocator are abstract and cannot be compared. Or else no two allocations must return the same value.

The key problem is that the API allows zero sized allocations. Requests for zero sized allocations can return the same pointer multiple times or can return a pointer which compares equal to the pointer returned by a non-zero-sized allocation. Global already has these properties.

#![feature(allocator_api, slice_ptr_get)]

use std::alloc::{Allocator, Layout};

pub fn boom<A: Allocator>(alloc: A) {
    let a_layout = Layout::new::<()>();
    let a_alloc = alloc.allocate(a_layout).unwrap().as_non_null_ptr();

    let b_layout = Layout::new::<u8>();
    let b_alloc = alloc.allocate(b_layout).unwrap().as_non_null_ptr();

    if a_alloc == b_alloc {
        unsafe {
            alloc.deallocate(a_alloc, b_layout); // !!
        }
    }
}

The easiest fix would be to disallow zero-sized allocations.

@RalfJung This seems right up your alley.

thomcc commented 3 years ago

The easiest fix would be to disallow zero-sized allocations.

I'm very much in favor of this, and was meaning to file an issue about it. It forces every allocator that wraps an allocator that doesn't support this (or that doesn't support this for arbitrary alignments) — which is the lions share of currently existent allocators — to have extra checks to support what IMO is a dubious use case.

GlobalAllocator calls likely won't be inlined, so this check won't be removed.

mahkoh commented 3 years ago

There already was a discussion in the past and the performance aspect was likely considered there: https://github.com/rust-lang/wg-allocators/issues/16

This issue is more about the soundness of the API if allocators are allowed to return the same "live" pointer multiple times. If they are not allowed to return a dummy pointer for zero-sized allocations, then using an allocator for a zero-sized allocation is most likely immensely inefficient.

thomcc commented 3 years ago

There already was a discussion in the past

Nothing is set in stone until it's stabilized, but fair enough, this isn't the time to discuss it. That said, IMO together these make for reasonable arguments to forbid it.

This issue is more about the soundness of the API if allocators are allowed to return the same "live" pointer multiple times.

This is also why most existing allocators don't support this — the C standard requires that either they return null, or a unique pointer (I think strictly speaking it says it's UB or IB, but other requirements of malloc require this).

Generally this means keeping a counter you cast to a pointer and return. But even allocators that support it don't support it for arbitrary alignments.

Amanieu commented 3 years ago

What part of the documentation specifically prevents an allocator from always returning the same address for a zero-sized allocation? The intent is that this is allowed.

mahkoh commented 3 years ago

I have to leave for the holidays where I won't have access to a PC so I'll keep it short:

deallocate says: "Deallocates the memory referenced by ptr." However, ptr does not uniquely identify a memory block if allocate can return the same pointer multiple times. (Unless we say that pointers returned by allocate are opaque and cannot be compared. The current docs do not talk about this.)
If deallocate in the example in OP accesses memory behind a_alloc, the behavior is undefined because the pointer a_alloc is not based on b_alloc: https://llvm.org/docs/LangRef.html#pointer-aliasing-rules (Note that b_alloc did not contribute to the inttoptr cast that created a_alloc. The comparison a_alloc == b_alloc occurred after the inttoptr cast.)

thomcc commented 3 years ago

Unless we say that pointers returned by allocate are opaque and cannot be compared.

Well, https://doc.rust-lang.org/beta/std/ptr/fn.eq.html is safe, so I don't think that's viable.

mahkoh commented 3 years ago

What I meant to say is that they don't represent the same block of memory even if they are equal unless one is based on the other. Where based on is defined in the link above.

Lokathor commented 3 years ago

without taking a position on if provanince in equality is good or bad: the docs for pointer equality seem to imply that it's a simple address comparison and that's it. would make it tricky to change even if we did end up wanting to do a change.

thomcc commented 3 years ago

My stance here would be that we're gaining the ability for this to be a safe function, and in exchange now provenance is now directly realized in the API.

This does not seem like a good trade, IMO (but I am not entirely unbiased here).

Amanieu commented 3 years ago

I see the problem. I think we just need to add wording that the pointer passed to dealloc must be "based-on" the pointer that was originally returned by alloc. This would use the same provenance rules that are used for stack allocations for local variables.

cc @RalfJung

mahkoh commented 3 years ago

Unfortunately several problems remain:

A pointer can be based on multiple pointers that point to the same address. Address and "based on" are not enough to identify an allocated memory block.
The spec allows an allocator to unsoundly return overlapping slices.
It is possible to write a sound allocator that returns the same non-empty slice without deallocating it in between. This makes it impossible to identify memory blocks using the current spec.
The current spec ignores multi threading. The current implementation of Global is unsound if the user makes use of the full range of patterns allowed by the spec. (deallocate calls are not required to be synchronized for the same address.)

I've written a new spec that fixes all of these problems as long as zero-sized allocations are not allowed. It's not clear if it's possible to extend it to allow zero-sized allocations.

We say that a successful allocate call references the address of the first byte in the slice it returns.
We say that a deallocate call references the address of its pointer argument.
The implementation of allocate must assume that the caller writes to the referenced address immediately after a successful return. Otherwise the behavior is undefined.
The caller of deallocate must assume that the implementation writes to the referenced address immediately after the call. Otherwise the behavior is undefined.
Informational note: this implies that the returns of successful allocate calls and deallocate calls that reference the same address are uniquely totally ordered in a program without undefined behavior.
Let a be the return of a successful allocate call and d be a deallocate call. We say that a matches d if and only if a occurs before d in the total order of returns of successful allocate calls and deallocate calls that reference the address referenced by a and
- a occurs immediately before d in that order, or
- immediately after a there occurs the return of a successful allocate call a' in that order and immediately before d there occurs a deallocate call d' in that order such that a' matches d'.
Informational note: this implies that the return of a successful allocate call matches at most one deallocate call and vice versa.
For every deallocate call d there must be the return of a successful allocate call a such that a matches d. Otherwise the behavior is undefined.
For all such a, d
- self must reference the same object modulo cloning
- the pointer passed to d must be based on the pointer returned by a
- the alignment of the layout argument of d must be the same as the alignment of the layout argument of a
- the size of the layout argument of d must be between the size of the layout argument of a and the size of the slice returned by a
Otherwise the behavior is undefined.
After the return of a successful allocate call a, the caller gains a mutable borrow of the returned slice from that point.
- If a matches a deallocate calld, then the borrow lasts until d.
- Otherwise it lasts for the rest of the program execution.
This slice is non-empty and satisfies the requirements of the layout argument. Otherwise the behavior is undefined.
When the last clone of an allocator is dropped, the behavior is as if a deallocate call fulfilling the requirements mentioned in point 9 were executed for each thus far unmatched successful allocate call whose self argument references this allocator modulo cloning. The order of these deallocate calls is consistent with the reverse of the program execution order of the allocate calls. Otherwise the behavior is undefined.

RalfJung commented 3 years ago

My stance here would be that we're gaining the ability for this to be a safe function, and in exchange now provenance is now directly realized in the API.

What is "this" and what do you mean by "directly realized in the API"?

I see the problem. I think we just need to add wording that the pointer passed to dealloc must be "based-on" the pointer that was originally returned by alloc. This would use the same provenance rules that are used for stack allocations for local variables.

The dealloc pointer needs to have the right provenance -- that is definitely already the case. A pointer with the wrong provenance cannot even be used to read or write to the given memory (as per the docs of wrapping_offset, to name just one example), and I hope we agree that "this pointer may be used to deallocate that memory" should imply "this pointer may be used to read/write that memory".

This seems reasonable to spell out in more detail, and should fix the example in the OP.

The remaining problems I see don't actually break the spec, they just constrain implementations more than one might expect. In particular, if an allocator returns pointers for the same address for two different allocation requests without intermediate deallocation (which necessarily means one of them was zero-sized), then upon deallocation there is no way for the implementationto determine which of these two pointers is supposed to be deallocated (since it cannot test "based-on" / provenance). IOW, no allocator must ever use the same address for both a zero-sized and a non-zero-sized allocation. (Using the same address for multiple zero-sized allocations is perfectly fine since deallocating them is a NOP.)

@mahkoh given that your original example is fixed by @Amanieu's proposal (the code causes UB if a_alloc == b_alloc since a ptr with the wrong provenance is used for deallocation), I am wondering which remaining problem you see that requires spec changes. I also see no connection to concurrency here.

The spec allows an allocator to unsoundly return overlapping slices.

The allocation function should of course guarantee that the set of memory addresses given by [ptr, ptr+size) is disjoint from the set of any other live allocation. This is not spelled out explicitly currently, but does not require fundamental changes to the spec to be taken into account. Writing an allocator spec in full precision without separation logic is annoying and somewhat tedious, but I don't see anything subtle going on here (other than for the ZST case, where indeed there are subtleties).

Amanieu commented 3 years ago

The allocator API requires the length of the memory block to be passed in when deallocating, so the allocator can always distinguish zero-sized blocks from non-zero-sized blocks.

mahkoh commented 3 years ago

In particular, if an allocator returns pointers for the same address for two different allocation requests without intermediate deallocation (which necessarily means one of them was zero-sized)

That is not correct. Consider an allocator that satisfies requests from a pool of pre-existing allocations. Such an allocator might provide an api to add allocations to that pool. It could then be fed an allocation returned by itself. Therefore this allocator would (soundly) return the same slice twice.

My spec models this by treating the return value of an allocation as a borrowed slice (not an owned slice.) Feeding the slice into the allocator is then an operation whose validity can easily be understood using the existing lifetime semantics of the language. Of course, such nested allocations must then be unwound in reverse order. My spec also takes this into account.

IOW, no allocator must ever use the same address for both a zero-sized and a non-zero-sized allocation.

The current implementation of Global basically does alignment as *mut u8 which violates this requirement. It also supports arbitrarily large alignments.

(Using the same address for multiple zero-sized allocations is perfectly fine since deallocating them is a NOP.)

See below.

@mahkoh given that your original example is fixed by @Amanieu's proposal (the code causes UB if a_alloc == b_alloc since a ptr with the wrong provenance is used for deallocation)

I don't agree that it is fixed. As you said yourself, zero-sized slices must never have the same address as non-zero-sized slices. This is not required by the current spec and not implemented by Global. It might be possible to implement this efficiently but I do not see a way to do it so far.

Secondly, I think you're dismissing the question of deallocating zero-sized allocations a bit quickly. Consider an allocator that satisfies zero-sized allocations my calling malloc(1) (malloc with the argument 1) and then returns the empty slice pointing to the start of that allocation. Clearly deallocating that zero-sized allocation is not a NOP. The current spec does not dismiss the question of zero-sized allocations because doesn't even mention them. But I claim that the current spec is unsound and that it is not clear how to make it sound with zero-sized allocations. See below.

I am wondering which remaining problem you see that requires spec changes. I also see no connection to concurrency here.

I listed several problems above. Let me expand on them.

(1) A pointer can be based on multiple pointers that point to the same address. Address and "based on" are not enough to identify an allocated memory block.

The current spec models allocations in terms of "memory blocks". Each allocate call returns a reference to a memory block, so to say. A deallocate call must then, of course, be able to identify which memory block is supposed to be deallocated. The current spec uses the numerical value of the pointer and (as per @Amanieu's suggestion) pointer provenance to do so.

But this is obviously not enough to identify which memory block for a zero-sized allocation is supposed to be deallocated. Simply take the bitwise or of two pointers with the same numerical value and the result is a pointer that is based on both source pointers. The implementation cannot decide which allocation to deallocate. You've said that this is ok because it's a NOP and therefore zero-sized allocations can return the same pointer multiple times. Not so. Consider the malloc(1) example above. What if the allocator used malloc(1) for one zero-sized allocation but then used a transmuted integer to satisfy the second zero-sized allocation and what if that integer had the same numerical value as the malloc(1)'d pointer? According to your ad-hoc spec above, that would be a valid implementation because both allocations are zero-sized. But clearly it is not.

(2) The spec allows an allocator to unsoundly return overlapping slices.

You've addressed this:

The allocation function should of course guarantee that the set of memory addresses given by [ptr, ptr+size) is disjoint from the set of any other live allocation.

Clearly by "any other live allocation" you only mean any other allocation from the same allocator. Otherwise a simple delegating allocator would be invalid. But the self-feeding example above shows that this is not a correct assumption. It's also not a necessary assumption if allocations are modeled as borrows that can be reborrowed.

(3) The current spec ignores multi threading. The current implementation of Global is unsound if the user makes use of the full range of patterns allowed by the spec. (deallocate calls are not required to be synchronized for the same address.)

The current spec says that calling deallocate is valid if

the memory block has not been subsequently deallocated

What does "subsequently" mean? I assume it is used in the same sense as "happens before" in C++ (which is also the model used by Rust iirc.) This means the following code is allowed: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=0e3e7ab18eac96309fc07fe73f7f1bf6

Because neither deallocate call happens before the other.

My spec makes use of the C++ specification by saying that a slice returned from allocate is immediately writable and pointer passed to deallocate is immediately writable. This gives us a total order of allocate and deallocate operations (that operate on the same pointer) for free because the C++ spec guarantees that it exists (in programs without UB.)

You said something similar by saying that the slice should be considered writable. But what does this mean for zero-sized slices? How do we integrate zero-sized slices into this model? And we must integrate them into this model if the zero-sized slice is actually backed by a non-zero allocation. I don't think you can write an allocator spec without taking the multi-threaded memory model into account because allocators are the one place where we opt-out of the compiler doing it for us.

mahkoh commented 3 years ago

The allocator API requires the length of the memory block to be passed in when deallocating, so the allocator can always distinguish zero-sized blocks from non-zero-sized blocks.

Like I said above, an allocator might decide to satisfy a zero-sized request with a non-zero sized slice or it might back a zero-sized slice by a non-zero-sized allocation. Nothing in the spec prevents this.

mahkoh commented 3 years ago

To be clear: As long as we exclude zero-sized allocations, I'm not saying that the intentions of the current spec are fundamentally broken. Merely that the current spec

Makes it impossible to write an allocator and show its correctness to a reasonable degree
Makes it impossible to use an allocator and show that this usage is correct to a reasonable degree
Allows using allocators in ways that make it impossible to implement even Global correctly

I discovered this while trying to show that the allocator I wrote in 8dc7b0c (#80273) is correct. I wrote my spec above to allow doing this as long as we allow inductive reasoning (if the behavior is not undefined up to this point, then the behavior of the following operation is also not undefined.) I suggest you read it. In some ways its both shorter and simpler than the current documentation of the Allocator trait.

RalfJung commented 3 years ago

My spec models this by treating the return value of an allocation as a borrowed slice (not an owned slice.) Feeding the slice into the allocator is then an operation whose validity can easily be understood using the existing lifetime semantics of the language. Of course, such nested allocations must then be unwound in reverse order. My spec also takes this into account.

Ownership transfer is the right spec for an allocator, though. And it covers your case -- that's just also having ownership transfer from the user back to the allocator.

IOW, no allocator must ever use the same address for both a zero-sized and a non-zero-sized allocation.

The current implementation of Global basically does alignment as *mut u8 which violates this requirement. It also supports arbitrarily large alignments.

See what @Amanieu wrote -- I was actually wrong in my statement here.

I don't agree that it is fixed. As you said yourself, zero-sized slices must never have the same address as non-zero-sized slices. This is not required by the current spec

We arrived at this requirement from the spec, so it clearly is required by the current spec. It's just not stated explicitly, but so what?

Also, it was a wrong conclusion, as @Amanieu pointed out.

But I claim that the current spec is unsound and that it is not clear how to make it sound with zero-sized allocations. See below.

I have no idea what it means for a spec to be "unsound". That adjective does not make sense when applied to specs, in any meaning I am familiar with.

The current spec models allocations in terms of "memory blocks". Each allocate call returns a reference to a memory block, so to say. A deallocate call must then, of course, be able to identify which memory block is supposed to be deallocated. The current spec uses the numerical value of the pointer and (as per @Amanieu's suggestion) pointer provenance to do so.

The spec says nothing about how the implementation identifies the memory block. It leaves that up to the implementation. It is not the job of the spec to say such things.

What the spec says is that the address and layout of the block are all that the implementation has available for this. That is certainly enough, since correct implementations are possible -- as witnessed by the fact that correct implementations exist.

The implementation that returns alignment as *mut u8 for zero-sized allocations is perfectly fine if the corresponding deallocation function is a NOP when the size (which is passed in by the caller) is zero.

Simply take the bitwise or of two pointers with the same numerical value and the result is a pointer that is based on both source pointers.

There's no bitwise-or operation on pointers. So you are talking about integer roundtrips here, and without a more precise model of int-to-ptr casts, I think it is futile to attempt to discuss here what their behavior is.

You've said that this is ok because it's a NOP and therefore zero-sized allocations can return the same pointer multiple times. Not so. Consider the malloc(1) example above. What if the allocator used malloc(1) for one zero-sized allocation but then used a transmuted integer to satisfy the second zero-sized allocation and what if that integer had the same numerical value as the malloc(1)'d pointer? According to your ad-hoc spec above, that would be a valid implementation because both allocations are zero-sized. But clearly it is not.

It's not a valid implementation since deallocation is broken. I don't get your examples, you are constructing broken implementations and then claiming that that shows an issue with the spec?

Like I said above, an allocator might decide to satisfy a zero-sized request with a non-zero sized slice or it might back a zero-sized slice by a non-zero-sized allocation. Nothing in the spec prevents this.

Sure, but if the allocator does this then it has to be able to also handle this situation during deallocation. If it cannot, it is a wrong implementation. No spec bug here that I can see.

If it is impossible to have an implementation that satisfies zero-sized requests in such a mixed way, then that's a consequence of the spec, but there is no reason for the spec to call this out explicitly. Is your complaint that the spec should be changed to make such implementations possible?

Makes it impossible to write an allocator and show its correctness to a reasonable degree

Makes it impossible to use an allocator and show that this usage is correct to a reasonable degree

Well, yeah, it's not a formal spec that can be used in a mathematical proof. Neither is yours. A formal spec should be written in some form of Hoare logic, and ideally in Separation Logic as it brings the best tools for talking about heaps.

Allows using allocators in ways that make it impossible to implement even Global correctly

I don't understand what you mean by this.

All your objections, as far as I can see, have nothing at all to do with zero-sized allocations. They are all about specifying more precisely the ownership transfer that happens during allocation and deallocation: allocation returns exclusive ownership of the given region of memory to the caller, and deallocation consumes exclusive ownership of the given region of memory from the caller. The current spec tries to say this when defining "currently allocated". I agree that this is somewhat unsatisfying, but I disagree that your spec is the right fix -- it seems way overcomplicated to me. Just acknowledge that we work in an ownership-based world, and specify the interaction with the allocator as transferring ownership back and forth. The vast literature on separation logic shows that this is the right way to specifying allocators in a way that is amenable to proofs.

mahkoh commented 3 years ago

Before I answer your post in detail, let me put the central theme at the top:

You talk about how the implementation identifies the block to be deallocated. I talk about how the user identifies the block to be deallocated.

Many of your arguments come down to "the implementation knows how it is implemented and therefore there is no problem".

But I claim that the user must know which memory blocks will be affected by a deallocate call. And the user can only rely on what the spec tells it about identifying memory blocks to do so because it might accept an allocator as a generic argument. The current spec does not allow the user to do so.

Ownership transfer is the right spec for an allocator, though. And it covers your case -- that's just also having ownership transfer from the user back to the allocator.

I disagree in the context of Rust. In Rust owned slices of memory cannot be split up into multiple owned parts. E.g. Box<[u8]> cannot be partitioned into two Box<[u8]>. Mutably borrowed slices can be split up &mut [u8] -> (&mut [u8], &mut [u8]). Splitting up memory is a natural operation within an allocator. E.g. an allocator allocates a large chunk of memory from another allocator and then hands parts of this memory out upon request. Once all of these reborrows have ended, it is allowed to also end its own borrow of the large chunk. This is just like it works in compiler-checked Rust and therefore easy to understand.

Furthermore, the lifetime of an allocation is bound by the lifetime of an allocator. Once the last clone of an allocator is dropped, the slices can no longer be used. This, again, is just like it works with mutable borrows: Only once all borrows are dead can object that handed out these borrows be dropped.

Usually allocators have the static lifetime so ownership transfer might be a good fit there. But Rust's allocators are much more powerful.

See what @Amanieu wrote -- I was actually wrong in my statement here.

I saw what he wrote but he is wrong. An allocator cannot always recognize zero-sized allocations from its layout argument during deallocation. E.g.

let slice = allocate(length: 0, alignment: 1);
if slice.len() == 1 {
    deallocate(slice, length: 1, alignment: 1); // this is valid
    // or:
    // deallocate(slice, length: 0, alignment: 1); // this is valid
}

We arrived at this requirement from the spec, so it clearly is required by the current spec.

The requirement does not follow from the spec. An allocator can simply internally store a hashmap of numerical pointer value -> all live allocations at that address including their layout. This allows an allocator to share the same addresses for non-zero-sized and zero-sized allocations.

The requirement of the spec (which is not even explicitly stated in the current spec) that we've been arguing about in this issue is that both the allocator and the user must be able to identify a unique "memory block" that will be affected by a deallocate call. Without this requirement, the spec of deallocate, which talks about the block that will be deallocated, is not well-defined.

Now, for the allocator this might be somewhat easy because it

Can internally store a hash map as described above
Knows that zero-sized deallocation is a NOP if it is implemented this way

But the user of the allocator can only rely on the spec of the trait. If the allocator can return the same output for the same allocate input multiple times (without deallocation in between) then the user cannot use the deallocate function because its behavior is not even well-defined.

I have no idea what it means for a spec to be "unsound".

Bad choice of words on my part. I meant that

It's not possible to implement this spec in the way it was intended and have that implementation be sound. E.g. what I talked about with concurrent deallocate calls.
It's not possible to use the dellocate function because nowhere in the spec does it specify how to identify the block that will be deallocated.

The spec says nothing about how the implementation identifies the memory block. It leaves that up to the implementation. It is not the job of the spec to say such things.

That really makes no sense. If a function accepts a generic A: Allocator, calls allocate twice and deallocate once, then it obviously must be able to know which memory block was deallocated. Furthermore, to even call deallocate it must be able to identify which memory block will be affected because it must choose the layout parameter to fit that memory block. In this generic case, the only information the function has in order to do this is what it says in the spec. Therefore it is up to the spec to spell out how to to identify the memory block.

There's no bitwise-or operation on pointers. So you are talking about integer roundtrips here, and without a more precise model of int-to-ptr casts, I think it is futile to attempt to discuss here what their behavior is.

Seems pretty clear from the llvm spec what the intended behavior is:

A pointer value formed by an inttoptr is based on all pointer values that contribute (directly or indirectly) to the computation of the pointer’s value.

Either way any allocator spec must support deallocation with int-to-ptr arguments because those are widely used.

It's not a valid implementation since deallocation is broken. I don't get your examples, you are constructing broken implementations and then claiming that that shows an issue with the spec?

What do you mean by "deallocation is broken"? The implementation of deallocate can simply be an empty function. This would be perfectly valid. The only thing that is broken is that the user of the allocator cannot identify the block that will be affected. And it's up to the spec to allow the user to do this. Not the implementation.

Sure, but if the allocator does this then it has to be able to also handle this situation during deallocation. If it cannot, it is a wrong implementation. No spec bug here that I can see.

See the conclusion at the top of the post (which I wrote at this point.)

Is your complaint that the spec should be changed to make such implementations possible?

If zero-sized allocations were allowed then we should not impose restrictions that not necessary to have a spec that is well-defined and easy to reason about.

Well, yeah, it's not a formal spec that can be used in a mathematical proof. Neither is yours.

Of course I'm using many words that I haven't defined in a precise enough way to perform a mathematical proof. But you're being a bit black and white here. There is a scale from "no spec at all" to "computed assisted proof".

There currently is a spec, but, when I tried to write down for my allocator why the allocate and deallocate functions (which delegate to another allocator after some layout modification and pointer offsetting) were implemented correctly, I was not able to do so based on the words I found in the spec. Not to a degree that would satisfy me. I can still assume that my implementation is correct because I can assume what the people of wg-allocators intended to specify.

The spec I wrote down above, on the other hand, does give me the necessary words and guarantees to do so.

Allows using allocators in ways that make it impossible to implement even Global correctly

I don't understand what you mean by this.

I literally linked an example that showed two threads calling deallocate at once and claimed that this usage was fine according to the spec. You skipped that part.

All your objections, as far as I can see, have nothing at all to do with zero-sized allocations. They are all about specifying more precisely the ownership transfer that happens during allocation and deallocation: allocation returns exclusive ownership of the given region of memory to the caller, and deallocation consumes exclusive ownership of the given region of memory from the caller.

Let's not forget that dropping the last clone of the allocator also terminates the memory allocation. I think this is where the ownership-based model is most problematic because it's not really how the rest of Rust behaves. Once you own something, you can usually forget about where you got that something from.

The current spec tries to say this when defining "currently allocated". I agree that this is somewhat unsatisfying, but I disagree that your spec is the right fix -- it seems way overcomplicated to me.

Really? I think it's rather compact. Not only does it not have to define leaky abstractions like "memory block", but it also gets thread-safety for free without having to explicitly talk about it. Let me say once again that the current spec allows you to call deallocate the same allocation from two different threads concurrently. Ok, sure, just like with provenance, we can now go and tack on a few more sentences to the spec to disallow this. But then please don't say that my spec is overcomplicated when it delegates almost all of the complicated stuff to the C++ memory model which any programmer of unsafe, concurrent Rust must already know.

This is also where zero-sized allocations come into play. Because I don't see how you can get a spec as simple as this if you allow zero-sized allocations. I'd be happy to see one.

The vast literature on separation logic shows that this is the right way to specifying allocators in a way that is amenable to proofs.

I do not know about separation logic. Does the literature also deal with allocators that can be dropped and thereby invalidate all previous allocations?

ojeda commented 3 years ago

This is also why most existing allocators don't support this

Requesting a zero-sized allocation works in most general purpose allocators -- what do you mean?

the C standard requires that either they return null, or a unique pointer (I think strictly speaking it says it's UB or IB, but other requirements of malloc require this).

With zero size, it is ID. It will be UB only for realloc starting with C23.

RalfJung commented 3 years ago

But I claim that the user must know which memory blocks will be affected by a deallocate call. And the user can only rely on what the spec tells it about identifying memory blocks to do so because it might accept an allocator as a generic argument. The current spec does not allow the user to do so.

Due to the size argument, I don't see any issue with the user knowing which memory block will be deallocated. Could you give a concrete example that you think is ambiguous? (Apologies if you have already given it; there have been a few examples in this issue and so far it wasn't clear to me what you even were complaining about.)

It's not possible to implement this spec in the way it was intended and have that implementation be sound. E.g. what I talked about with concurrent deallocate calls.

Are you saying all the existing implementations are unsound? I very much doubt that. If you think the curent spec does not sufficiently spell out the synchronization requirements, I do not think that this is hard to fix. But I think two concurrent dealloc calls are already an incorrect use of the spec since there is no way for you to guarantee that the memory hasn't been deallocated yet, if there is a concurrent thread that might also deallocate it.

mahkoh commented 3 years ago

Due to the size argument, I don't see any issue with the user knowing which memory block will be deallocated. Could you give a concrete example that you think is ambiguous?

E.g. an allocator that returns

a zero-sized slice for the first zero-sized allocation
a non-zero-sized slice for the second zero-sized allocation

where both data pointers are identical. Note that, if the deallocate implementation is an empty function, there in no issue with making the implementation correct.

If the user then calls deallocate with the data pointer and a zero size, the user cannot determine which allocation was affected by the deallocate call.

The concrete example aside: The spec does not say how to identify a memory block at all. I'm using my pre-existing knowledge of allocators to infer that a memory block is "identified" by the data pointer returned by allocate. However, this interpretation has problems:

A data pointer only identifies a memory block at a single point in time. At some other point, the pointer might identify a different memory block even if only non-zero-sized allocations are involved. (The spec clearly intends that each allocate call "returns" a unique (over the whole lifetime of the program) memory block due to the language around "currently allocated" memory.)
A data pointer does not even uniquely identify a memory block at a single point in time as soon as zero-sized allocations are involved.

Are you saying all the existing implementations are unsound? I very much doubt that.

No. I'm saying that the implementation of Global is unsound. I've linked the Global breaking example above.

If you think the curent spec does not sufficiently spell out the synchronization requirements, I do not think that this is hard to fix.

I would like to see that.

But I think two concurrent dealloc calls are already an incorrect use of the spec since there is no way for you to guarantee that the memory hasn't been deallocated yet, if there is a concurrent thread that might also deallocate it.

The language of the spec is

the memory block has not been subsequently deallocated

I'm reading this as "no deallocate call has happened before this point." With this reading, concurrent deallocate calls are fine. You seem to be additionally inferring a total-ordering requirement for all operations that operate on the same memory block. This would prevent concurrent deallocate calls if we were able to identify memory blocks. But we can't in the zero-sized allocation case. So we would have to add even more language to exemt zero-sized allocations from this total-ordering requirement. And then we have to look at the degenerate example above and see what's going on theer.

mahkoh commented 3 years ago

To repeat myself:

The intentions of the current spec are fine
The language of the current spec is insufficient
I see no sane way to fix the language while allowing zero-sized allocations

mahkoh commented 3 years ago

From the C17 section on memory allocation:

If the size of the space requested is zero, the behavior is implementation-defined: either a null pointer is returned to indicate an error, or the behavior is as if the size were some nonzero value, except that the returned pointer shall not be used to access an object.

For purposes of determining the existence of a data race, memory allocation functions behave as though they accessed only memory locations accessible through their arguments and not other static duration storage. These functions may, however, visibly modify the storage that they allocate or deallocate. Calls to these functions that allocate or deallocate a particular region of memory shall occur in a single total order, and each such deallocation call shall synchronize with the next allocation (if any) in this order.

(Highlights mine.)

The second highlight explicitly requires that all operations that operate on a region of memory must happen in a total order. The notion of region of memory is clearly different from memory block in that

the same region can be referenced by multiple malloc calls
each region is non-empty due to the first highlight

Note also the sentence before the second highlight which independently enforces a total order.

RalfJung commented 3 years ago

E.g. an allocator that returns

Nono, don't make any assumptions about the allocator. You said this is a problem for clients. So there should be a client that performs a sequence of allocate and deallocate calls (and potentially others) and the result is unclear when the only thing we know about the allocator is that it follows the spec. What is that client?

If the user then calls deallocate with the data pointer and a zero size, the user cannot determine which allocation was affected by the deallocate call.

Of course it can? Only the zero-sized allocation can possibly be affected?

No. I'm saying that the implementation of Global is unsound. I've linked the Global breaking example above.

If you mean the OP, then that's already fixed by saying that the deallocate ptr must be derived from the ptr returned by allocate. Or, more precisely, its provenance must be for the right allocation. IOW, that program has UB, it is not using the API correctly. I thought we were talking about what remains to be done once that change is made.

mahkoh commented 3 years ago

Nono, don't make any assumptions about the allocator. You said this is a problem for clients. So there should be a client that performs a sequence of allocate and deallocate calls (and potentially others) and the result is unclear when the only thing we know about the allocator is that it follows the spec. What is that client?

Sure (pseudocode):

fn f<A: Allocator>(a: A) {
    let x = a.alloc(0);
    let y = a.alloc(0);
    if y.len() == 1 && y.as_ptr() == x.as_ptr() {
        a.dealloc((y.as_ptr() as usize | x.as_ptr() as usize) as *mut u8, 0);
        y[0] = 0; // valid or invalid?
        // alternatively:
        // a.dealloc((y.as_ptr() as usize | x.as_ptr() as usize) as *mut u8, 0);  // valid or invalid?
    }
}

The example I gave in the previous post showed that there is an Allocator that follows the spec where the if branch can be taken.

Of course it can? Only the zero-sized allocation can possibly be affected?

That's incorrect. Recall the definition of memory fitting:

/// * The provided [`layout.size()`] must fall in the range `min ..= max`, where:
///   - `min` is the size of the layout most recently used to allocate the block, and
///   - `max` is the latest actual size returned from [`allocate`], [`grow`], or [`shrink`].

Note that min = 0 in this case.

If you mean the OP, then that's already fixed by saying that the deallocate ptr must be derived from the ptr returned by allocate. Or, more precisely, its provenance must be for the right allocation. IOW, that program has UB, it is not using the API correctly. I thought we were talking about what remains to be done once that change is made.

No I meant the example with concurrent deallocations.

mahkoh commented 3 years ago

Maybe we can clear up some of our miscommunication here by discussing the following question:

If no zero-sized allocations are involved, does the 3-tuple of

allocator
pointer with provenance
layout

identify at most one currently allocated memory block?

The answer is no but I think you're believing that the answer is yes. If you think that the answer is yes, I'll write code that shows that it's not.

(For zero-sized allocations the answer is obviously no.)

RalfJung commented 3 years ago

y.len() == 1

Wait, so you are testing for a case where you requested something of size 0 but actually got a non-zero sized allocation? I didn't realize that that's possible. @Amanieu I wonder what you think about this, since it breaks the argument that the allocator can use the size argument to distinguish these allocations.

In some sense it's the allocator's fault for putting itself into a situation where it cannot tell which allocation the user meant (even if we ignore all the provenance shenanigans, from a purely operational perspective), but if the user has to take this possibility into account, then any dealloc(x, 0) poses the risk of "accidentally" deallocating some other memory block at the same location that also had requested size 0 but its real size is non-0 (so it could actually be in use with some data). We could add a requirement that there may never be multiple "currently allocated" blocks at the same address with overlapping sizes (i.e., the min ..= max range of sizes accepted for deallocation of these blocks overlap, which means there could be ambiguity about which block is meant to be deallocated), except if all of them have size 0 (max == 0). This would mean it is illegal to write an allocator where your if branch is ever taken. No reasonable allocator would do this, right?

The example I gave in the previous post showed that there is an Allocator that follows the spec where the if branch can be taken.

You have more than a dozen posts in this thread, I have no idea which one you mean. Hyperlinks are a great invention for cases where you actually want others to know for sure, without guessing, what thing you are referring to. ;)

No I meant the example with concurrent deallocations.

I cannot find a code example with concurrent deallocations. I won't re-read the entire thread to figure out what you mean. Please, if you expect others to follow your arguments, make it easy for them by providing links to the data you are using to support your arguments.

If no zero-sized allocations are involved, does the 3-tuple of [...] identify at most one currently allocated memory block?

I'd say it certainly does, even without provenance -- how else would one even implement deallocation for this allocator? I assume you are thinking about concurrency again here?

I feel like we're having two different discussions in this thread, and I never know which discussion you are even responding to. It would help to keep different, independent discussions (such as your concern around concurrency, and your concern around zero-sized allocations) in separate issues.

Amanieu commented 3 years ago

Yes, an allocator can return a larger size than what was requested. This allows Vec to use all of the available memory on allocators that round sizes up to the next size class.

RalfJung commented 3 years ago

Right, what I was asking about are the consequences for zero-sized allocations, like in @mahkoh's example. However, thinking about it some more, I think I view this example as being similar to the following:

fn f<A: Allocator>(a: A) {
    let x = a.alloc(1);
    let y = a.alloc(1);
    if y.len() == 1 && y.as_ptr() == x.as_ptr() {
        a.dealloc((y.as_ptr() as usize | x.as_ptr() as usize) as *mut u8, 1);
        y[0] = 0; // valid or invalid?
        // alternatively:
        // a.dealloc((y.as_ptr() as usize | x.as_ptr() as usize) as *mut u8, 1);  // valid or invalid?
    }
}

Here, too, it would be unclear which allocation is deallocated by the first a.dealloc. However, this is not actually a problem since that code is clearly unreachable. So, I am now fairly sure that we want to arrange things in a way that likewise, the dealloc in @mahkoh's example is unreachable. There can never be two allocations with the same base address unless both of them have size 0 -- and this refers to the actual size of the allocation, not the requested size.

I am not sure if this requires any change to the allocator spec -- in principle, whatever wording lets us conclude that the dealloc is unreachable in my example, should be sufficient to conclude the same thing about @mahkoh's example. Currently, the allocate spec does not say anything explicit about freshness, maybe it should? Something like

The returned block will not overlap with any other memory block that is currently allocated. Furthermore, it will not have the same address as any other memory block that is currently allocated, except possibly if both blocks have size 0.

Amanieu commented 3 years ago

There can never be two allocations with the same base address unless both of them have size 0 -- and this refers to the actual size of the allocation, not the requested size.

That's not sufficient if we want to allow allocators to conjure a base address out of thin air for ZSTs (e.g. by casting the alignment to a pointer), since there could be a real allocation (with >0 requested size) at that address. The allocator can still differentiate these cases since the real allocation must be deallocated with a non-zero size.

RalfJung commented 3 years ago

That's not sufficient if we want to allow allocators to conjure a base address out of thin air for ZSTs (e.g. by casting the alignment to a pointer), since there could be a real allocation (with >0 requested size) at that address. The allocator can still differentiate these cases since the real allocation must be deallocated with a non-zero size.

Oh, right. So it's more like: there can never be two allocations with the same base address except if one has size zero, and the other one

either has size 0, or
has requested size greater zero (so, cannot be deallocated with size 0)

mahkoh commented 3 years ago

You have more than a dozen posts in this thread, I have no idea which one you mean.

"Previous" referring to the post that is the maximum of all of my posts in this thread that occurred before the post in which I wrote "previous" in the total order induced by the GitHub UI. This should be rather clear from the context. But, since you didn't reply for a long time, this might have been less clear to you than to someone who read the thread from top to bottom.

Anyway, this post is of no consequence now because you've already agreed that it's possible for the if branch to be taken.

I cannot find a code example with concurrent deallocations. I won't re-read the entire thread to figure out what you mean. Please, if you expect others to follow your arguments, make it easy for them by providing links to the data you are using to support your arguments.

I'm sorry but you already referenced concurrent deallocations yourself in this thread. You cannot expect others to write all of their posts is a way that they make sense to people who only read that one post or have forgotten what was already discussed.

Here I discussed concurrency issues with the current spec. In your following reply you did not even address this point. So it makes sense that you do not remember it. You possibly didn't even read it the first time.

The fact that you did not know that allocators are allowed to return slices with arbitrary lengths at least as large as what was requested tells me that you did not read the current spec carefully, if at all. If you do not have time or interest to take part in this discussion then that is completely fine. But please do not make opinionated posts without putting in the necessary work to understand what is being said and discussed. Because your opinions count for a lot.

I'd say it certainly does, even without provenance -- how else would one even implement deallocation for this allocator? I assume you are thinking about concurrency again here?

No this has nothing to do with concurrency. Consider the following code which is somewhat simplified and assumes alignment = 1 in all requests:

type A: Allocator = ...;

fn g(a: &A, m: &mut [MaybeUninit<u8>], f: impl FnOnce()) {
    ...
}

fn f(a: &A) {
    let s = a.allocate(1).unwrap();
    g(a, transmute(s), || {
        let s2 = a.allocate(1).unwrap();
        if s.as_ptr() == s2.as_ptr() && s.len() == s2.len() {
            a.deallocate(s);
        }
    });
    a.deallocate(s);
}

Do you agree that, if such an A and g can be written and the if branch can be taken, then we have a problem?

Here is a short sketch, if that is not enough, then I can write a full implementation:

struct A {
    pool: Vec<(*mut [u8], bool)>, // (slice, in use)
}

impl !Sync for A { }

impl Allocator for A {
    fn allocate(&self, req: usize) {
        if (request can be satisfied from pool) {
            satifsy from pool and mark as in use
        } else {
            use global allocator
        }
    }

    fn deallocate(&self, s: *mut u8, len: usize) {
        if (address of s is in the pool) {
            mark entry as unused
        } else {
            use global allocator
        }
    }
}

impl A {
    fn feed(&self, s: *mut [u8]) {
        if (address of s is already in the pool) {
            panic!();
        }
        add s to the pool as unused
    }

    fn return_fed(&self, s: *mut [u8]) {
        if (address of s is not in the pool or s is in use) {
            panic!();
        }
        remove s from the pool
    }
}

fn g(a: &A, m: &mut [MaybeUninit<u8>], f: impl FnOnce()) {
    a.feed(transmute(m));
    f();
    a.return_fed(transmute(m));
}

This seems to be a perfectly fine allocator.

RalfJung commented 3 years ago

(Wtf, GH first duplicated my response and now it is acting up in weird ways... looks like they screwed up their database consistency...)

RalfJung commented 3 years ago

"Previous" referring to the post that is the maximum of all of my posts in this thread that occurred before the post in which I wrote "previous" in the total order induced by the GitHub UI. This should be rather clear from the context. But, since you didn't reply for a long time, this might have been less clear to you than to someone who read the thread from top to bottom.

That would be https://github.com/rust-lang/wg-allocators/issues/82#issuecomment-753666708, which contains no "example". So no, what you are saying is simply not accurate.

I'm sorry but you already referenced concurrent deallocations yourself in this thread. You cannot expect others to write all of their posts is a way that they make sense to people who only read that one post or have forgotten what was already discussed.

You brought it up though, thereby significantly raising the confusion level of this entire thread. I just responded to (some of) your statements.

I can expect people to write in a way that makes it as easy as reasonably possible to follow. This includes not jumping between different topics without even making those jumps clear, it includes not assuming that the reader has perfect memory, and it includes putting in hyperlinks where appropriate. This is just basic courtesy -- if you want others to make the effort to follow your train of thought, then you should put in some effort to make that easier.

The fact that you did not know that allocators are allowed to return slices with arbitrary lengths at least as large as what was requested tells me that you did not read the current spec carefully, if at all. If you do not have time or interest to take part in this discussion then that is completely fine. But please do not make opinionated posts without putting in the necessary work to understand what is being said and discussed. Because your opinions count for a lot.

Yes, I should have read about the status quo better. Sorry for this.

This seems to be a perfectly fine allocator.

Okay, so this is yet another problem that you also mixed into this thread -- it's about transferring ownership of some of the allocated memory back to the allocator, and then having that allocator hand out the same memory again. That's... quite subtle, and I have no idea if the allocator-wg considers such allocators "in scope". It's certainly an interesting example, but in the interest of sticking to "one topic per thread", I think it would be better if you opened yet another new issue for this one.

Can we agree to only discuss the ZST issue here henceforth? If you want to discuss the concurrency once, please open a new thread. (I'm not a WG member though so I have no say to what happens in this repo. I am just making suggestions that I think are in the best interesting of having a clear and constructive discussion.)

mahkoh commented 3 years ago

These topics are not quite as separated as you are making them out to be:

The spec that I've written above:
- Fully specifies the allocator behavior under concurrency.
- Handles this allocator just fine because allocate and deallocate calls that return/accept the same address are in a total order and therefore deallocate always acts on a allocate call (!), not on a memory block. As long as all calls are properly paired, there is no problem with the same address being returned multiple times.
But it only does so if we disallow zero sized allocations.

Therefore I know that these problems can be solved if we disallow zero-sized allocations. I do not know if they can be solved with zero-sized allocations allowed. Therefore, concurrency issues and creative allocator uses serve as a motivator for disallowing zero-sized allocations.

mahkoh commented 3 years ago

That would be #82 (comment), which contains no "example". So no, what you are saying is simply not accurate.

You got me there.

RalfJung commented 3 years ago

Just because your proposal solves all 3 issues at once, doesn't mean they should be discussed all at once. In particular, first it needs to be established that there are three problems, and those are certainly three separate discussions.

I think the ZST issues can be solved differently (that's what @Amanieu and me have been talking about).
For the concurrency issue, I think it suffices to say that all allocator operations are considered non-atomic writes, and therefore requires external synchronization to not be considered data races.
For "re-feeding" memory to the allocator, that sounds like the WG should have a discussion if they want to support this kind of allocator, and if yes how they want to support them in the spec. I don't have much if anything to contribute to this discussion.

I appreciate you consider these all closely related based on solving them all together, and separating them might feel like a step backwards to you, but there's no point in insisting on this until your audience agrees with this premise -- and convincing others of this requires separate discussion of the three issues you have with the current spec. It makes little sense to talk about your proposed spec until there is agreement about the issues that it is designed to solve (or at least, any such discussion would be improperly motivated -- that's why I haven't spent the time to dig into your spec yet; I am still in the state of needing to be convinced that such a spec is even necessary). Also understanding it will be much easier when the issues that it is designed to be solved are all clearly on the table. I think by posting your solution before clearly establishing the problems, you are putting the cart before the horse.

mahkoh commented 3 years ago

For the concurrency issue, I think it suffices to say that all allocator operations are considered non-atomic writes, and therefore requires external synchronization to not be considered data races.

That's exactly what my spec does. But I don't know what you mean by non-atomic writes if the slice is empty. That's the reason I had to disallow zero-sized allocations in my spec.

For "re-feeding" memory to the allocator, that sounds like the WG should have a discussion if they want to support this kind of allocator

I'd be more concerned about creative allocators that haven't yet been discovered and that, once discovered, will cause other implicit assumptions made by the spec to no longer hold.

I have no particular interest in re-feeding allocators except insofar as that they serve as an example that uncovers such implicit assumptions. In this case that (allocator, addr, size) uniquely identify an allocation.

The reason I opened this thread in the first place is that I found the spec to be underspecified, i.e., to be full of implicit assumptions.

RalfJung commented 3 years ago

But I don't know what you mean by non-atomic writes if the slice is empty. That's the reason I had to disallow zero-sized allocations in my spec.

Yes, that's a good question. I think it'd be best to explore this with a few concrete examples. IOW, I think this needs some room for discussion and exploring the design space of possible solutions -- room such as an issue dedicated to this problem.