Open mahkoh opened 3 years ago
The easiest fix would be to disallow zero-sized allocations.
I'm very much in favor of this, and was meaning to file an issue about it. It forces every allocator that wraps an allocator that doesn't support this (or that doesn't support this for arbitrary alignments) — which is the lions share of currently existent allocators — to have extra checks to support what IMO is a dubious use case.
GlobalAllocator calls likely won't be inlined, so this check won't be removed.
There already was a discussion in the past and the performance aspect was likely considered there: https://github.com/rust-lang/wg-allocators/issues/16
This issue is more about the soundness of the API if allocators are allowed to return the same "live" pointer multiple times. If they are not allowed to return a dummy pointer for zero-sized allocations, then using an allocator for a zero-sized allocation is most likely immensely inefficient.
There already was a discussion in the past
Nothing is set in stone until it's stabilized, but fair enough, this isn't the time to discuss it. That said, IMO together these make for reasonable arguments to forbid it.
This issue is more about the soundness of the API if allocators are allowed to return the same "live" pointer multiple times.
This is also why most existing allocators don't support this — the C standard requires that either they return null
, or a unique pointer (I think strictly speaking it says it's UB or IB, but other requirements of malloc require this).
Generally this means keeping a counter you cast to a pointer and return. But even allocators that support it don't support it for arbitrary alignments.
What part of the documentation specifically prevents an allocator from always returning the same address for a zero-sized allocation? The intent is that this is allowed.
I have to leave for the holidays where I won't have access to a PC so I'll keep it short:
deallocate
says: "Deallocates the memory referenced by ptr
." However, ptr
does not uniquely identify a memory block if allocate
can return the same pointer multiple times. (Unless we say that pointers returned by allocate
are opaque and cannot be compared. The current docs do not talk about this.)deallocate
in the example in OP accesses memory behind a_alloc
, the behavior is undefined because the pointer a_alloc
is not based on b_alloc
: https://llvm.org/docs/LangRef.html#pointer-aliasing-rules (Note that b_alloc
did not contribute to the inttoptr
cast that created a_alloc
. The comparison a_alloc == b_alloc
occurred after the inttoptr
cast.)Unless we say that pointers returned by allocate are opaque and cannot be compared.
Well, https://doc.rust-lang.org/beta/std/ptr/fn.eq.html is safe, so I don't think that's viable.
What I meant to say is that they don't represent the same block of memory even if they are equal unless one is based on the other. Where based on is defined in the link above.
without taking a position on if provanince in equality is good or bad: the docs for pointer equality seem to imply that it's a simple address comparison and that's it. would make it tricky to change even if we did end up wanting to do a change.
My stance here would be that we're gaining the ability for this to be a safe function, and in exchange now provenance is now directly realized in the API.
This does not seem like a good trade, IMO (but I am not entirely unbiased here).
I see the problem. I think we just need to add wording that the pointer passed to dealloc
must be "based-on" the pointer that was originally returned by alloc
. This would use the same provenance rules that are used for stack allocations for local variables.
cc @RalfJung
Unfortunately several problems remain:
I've written a new spec that fixes all of these problems as long as zero-sized allocations are not allowed. It's not clear if it's possible to extend it to allow zero-sized allocations.
a
be the return of a successful allocate call and d
be a deallocate call. We say that a
matches d
if and only if a
occurs before d
in the total order of returns of successful allocate calls and deallocate calls that reference the address referenced by a
and
a
occurs immediately before d
in that order, ora
there occurs the return of a successful allocate call a'
in that order and immediately before d
there occurs a deallocate call d'
in that order such that a'
matches d'
.d
there must be the return of a successful allocate call a
such that a
matches d
. Otherwise the behavior is undefined. For all such a
, d
self
must reference the same object modulo cloningd
must be based on the pointer returned by a
d
must be the same as the alignment of the layout argument of a
d
must be between the size of the layout argument of a
and the size of the slice returned by a
Otherwise the behavior is undefined.
After the return of a successful allocate call a
, the caller gains a mutable borrow of the returned slice from that point.
a
matches a deallocate calld
, then the borrow lasts until d
.This slice is non-empty and satisfies the requirements of the layout argument. Otherwise the behavior is undefined.
self
argument references this allocator modulo cloning. The order of these deallocate calls is consistent with the reverse of the program execution order of the allocate calls. Otherwise the behavior is undefined. My stance here would be that we're gaining the ability for this to be a safe function, and in exchange now provenance is now directly realized in the API.
What is "this" and what do you mean by "directly realized in the API"?
I see the problem. I think we just need to add wording that the pointer passed to dealloc must be "based-on" the pointer that was originally returned by alloc. This would use the same provenance rules that are used for stack allocations for local variables.
The dealloc pointer needs to have the right provenance -- that is definitely already the case. A pointer with the wrong provenance cannot even be used to read or write to the given memory (as per the docs of wrapping_offset
, to name just one example), and I hope we agree that "this pointer may be used to deallocate that memory" should imply "this pointer may be used to read/write that memory".
This seems reasonable to spell out in more detail, and should fix the example in the OP.
The remaining problems I see don't actually break the spec, they just constrain implementations more than one might expect. In particular, if an allocator returns pointers for the same address for two different allocation requests without intermediate deallocation (which necessarily means one of them was zero-sized), then upon deallocation there is no way for the implementationto determine which of these two pointers is supposed to be deallocated (since it cannot test "based-on" / provenance). IOW, no allocator must ever use the same address for both a zero-sized and a non-zero-sized allocation. (Using the same address for multiple zero-sized allocations is perfectly fine since deallocating them is a NOP.)
@mahkoh given that your original example is fixed by @Amanieu's proposal (the code causes UB if a_alloc == b_alloc
since a ptr with the wrong provenance is used for deallocation), I am wondering which remaining problem you see that requires spec changes. I also see no connection to concurrency here.
The spec allows an allocator to unsoundly return overlapping slices.
The allocation function should of course guarantee that the set of memory addresses given by [ptr, ptr+size)
is disjoint from the set of any other live allocation. This is not spelled out explicitly currently, but does not require fundamental changes to the spec to be taken into account. Writing an allocator spec in full precision without separation logic is annoying and somewhat tedious, but I don't see anything subtle going on here (other than for the ZST case, where indeed there are subtleties).
The allocator API requires the length of the memory block to be passed in when deallocating, so the allocator can always distinguish zero-sized blocks from non-zero-sized blocks.
In particular, if an allocator returns pointers for the same address for two different allocation requests without intermediate deallocation (which necessarily means one of them was zero-sized)
That is not correct. Consider an allocator that satisfies requests from a pool of pre-existing allocations. Such an allocator might provide an api to add allocations to that pool. It could then be fed an allocation returned by itself. Therefore this allocator would (soundly) return the same slice twice.
My spec models this by treating the return value of an allocation as a borrowed slice (not an owned slice.) Feeding the slice into the allocator is then an operation whose validity can easily be understood using the existing lifetime semantics of the language. Of course, such nested allocations must then be unwound in reverse order. My spec also takes this into account.
IOW, no allocator must ever use the same address for both a zero-sized and a non-zero-sized allocation.
The current implementation of Global
basically does alignment as *mut u8
which violates this requirement. It also supports arbitrarily large alignments.
(Using the same address for multiple zero-sized allocations is perfectly fine since deallocating them is a NOP.)
See below.
@mahkoh given that your original example is fixed by @Amanieu's proposal (the code causes UB if a_alloc == b_alloc since a ptr with the wrong provenance is used for deallocation)
I don't agree that it is fixed. As you said yourself, zero-sized slices must never have the same address as non-zero-sized slices. This is not required by the current spec and not implemented by Global. It might be possible to implement this efficiently but I do not see a way to do it so far.
Secondly, I think you're dismissing the question of deallocating zero-sized allocations a bit quickly. Consider an allocator that satisfies zero-sized allocations my calling malloc(1)
(malloc with the argument 1
) and then returns the empty slice pointing to the start of that allocation. Clearly deallocating that zero-sized allocation is not a NOP. The current spec does not dismiss the question of zero-sized allocations because doesn't even mention them. But I claim that the current spec is unsound and that it is not clear how to make it sound with zero-sized allocations. See below.
I am wondering which remaining problem you see that requires spec changes. I also see no connection to concurrency here.
I listed several problems above. Let me expand on them.
(1) A pointer can be based on multiple pointers that point to the same address. Address and "based on" are not enough to identify an allocated memory block.
The current spec models allocations in terms of "memory blocks". Each allocate call returns a reference to a memory block, so to say. A deallocate call must then, of course, be able to identify which memory block is supposed to be deallocated. The current spec uses the numerical value of the pointer and (as per @Amanieu's suggestion) pointer provenance to do so.
But this is obviously not enough to identify which memory block for a zero-sized allocation is supposed to be deallocated. Simply take the bitwise or of two pointers with the same numerical value and the result is a pointer that is based on both source pointers. The implementation cannot decide which allocation to deallocate. You've said that this is ok because it's a NOP and therefore zero-sized allocations can return the same pointer multiple times. Not so. Consider the malloc(1)
example above. What if the allocator used malloc(1)
for one zero-sized allocation but then used a transmuted integer to satisfy the second zero-sized allocation and what if that integer had the same numerical value as the malloc(1)
'd pointer? According to your ad-hoc spec above, that would be a valid implementation because both allocations are zero-sized. But clearly it is not.
(2) The spec allows an allocator to unsoundly return overlapping slices.
You've addressed this:
The allocation function should of course guarantee that the set of memory addresses given by [ptr, ptr+size) is disjoint from the set of any other live allocation.
Clearly by "any other live allocation" you only mean any other allocation from the same allocator. Otherwise a simple delegating allocator would be invalid. But the self-feeding example above shows that this is not a correct assumption. It's also not a necessary assumption if allocations are modeled as borrows that can be reborrowed.
(3) The current spec ignores multi threading. The current implementation of Global is unsound if the user makes use of the full range of patterns allowed by the spec. (deallocate calls are not required to be synchronized for the same address.)
The current spec says that calling deallocate is valid if
the memory block has not been subsequently deallocated
What does "subsequently" mean? I assume it is used in the same sense as "happens before" in C++ (which is also the model used by Rust iirc.) This means the following code is allowed: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=0e3e7ab18eac96309fc07fe73f7f1bf6
Because neither deallocate
call happens before the other.
My spec makes use of the C++ specification by saying that a slice returned from allocate is immediately writable and pointer passed to deallocate is immediately writable. This gives us a total order of allocate and deallocate operations (that operate on the same pointer) for free because the C++ spec guarantees that it exists (in programs without UB.)
You said something similar by saying that the slice should be considered writable. But what does this mean for zero-sized slices? How do we integrate zero-sized slices into this model? And we must integrate them into this model if the zero-sized slice is actually backed by a non-zero allocation. I don't think you can write an allocator spec without taking the multi-threaded memory model into account because allocators are the one place where we opt-out of the compiler doing it for us.
The allocator API requires the length of the memory block to be passed in when deallocating, so the allocator can always distinguish zero-sized blocks from non-zero-sized blocks.
Like I said above, an allocator might decide to satisfy a zero-sized request with a non-zero sized slice or it might back a zero-sized slice by a non-zero-sized allocation. Nothing in the spec prevents this.
To be clear: As long as we exclude zero-sized allocations, I'm not saying that the intentions of the current spec are fundamentally broken. Merely that the current spec
Global
correctlyI discovered this while trying to show that the allocator I wrote in 8dc7b0c
(#80273) is correct. I wrote my spec above to allow doing this as long as we allow inductive reasoning (if the behavior is not undefined up to this point, then the behavior of the following operation is also not undefined.) I suggest you read it. In some ways its both shorter and simpler than the current documentation of the Allocator trait.
My spec models this by treating the return value of an allocation as a borrowed slice (not an owned slice.) Feeding the slice into the allocator is then an operation whose validity can easily be understood using the existing lifetime semantics of the language. Of course, such nested allocations must then be unwound in reverse order. My spec also takes this into account.
Ownership transfer is the right spec for an allocator, though. And it covers your case -- that's just also having ownership transfer from the user back to the allocator.
IOW, no allocator must ever use the same address for both a zero-sized and a non-zero-sized allocation.
The current implementation of Global basically does alignment as *mut u8 which violates this requirement. It also supports arbitrarily large alignments.
See what @Amanieu wrote -- I was actually wrong in my statement here.
I don't agree that it is fixed. As you said yourself, zero-sized slices must never have the same address as non-zero-sized slices. This is not required by the current spec
We arrived at this requirement from the spec, so it clearly is required by the current spec. It's just not stated explicitly, but so what?
Also, it was a wrong conclusion, as @Amanieu pointed out.
But I claim that the current spec is unsound and that it is not clear how to make it sound with zero-sized allocations. See below.
I have no idea what it means for a spec to be "unsound". That adjective does not make sense when applied to specs, in any meaning I am familiar with.
The current spec models allocations in terms of "memory blocks". Each allocate call returns a reference to a memory block, so to say. A deallocate call must then, of course, be able to identify which memory block is supposed to be deallocated. The current spec uses the numerical value of the pointer and (as per @Amanieu's suggestion) pointer provenance to do so.
The spec says nothing about how the implementation identifies the memory block. It leaves that up to the implementation. It is not the job of the spec to say such things.
What the spec says is that the address and layout of the block are all that the implementation has available for this. That is certainly enough, since correct implementations are possible -- as witnessed by the fact that correct implementations exist.
The implementation that returns alignment as *mut u8
for zero-sized allocations is perfectly fine if the corresponding deallocation function is a NOP when the size (which is passed in by the caller) is zero.
Simply take the bitwise or of two pointers with the same numerical value and the result is a pointer that is based on both source pointers.
There's no bitwise-or operation on pointers. So you are talking about integer roundtrips here, and without a more precise model of int-to-ptr casts, I think it is futile to attempt to discuss here what their behavior is.
You've said that this is ok because it's a NOP and therefore zero-sized allocations can return the same pointer multiple times. Not so. Consider the malloc(1) example above. What if the allocator used malloc(1) for one zero-sized allocation but then used a transmuted integer to satisfy the second zero-sized allocation and what if that integer had the same numerical value as the malloc(1)'d pointer? According to your ad-hoc spec above, that would be a valid implementation because both allocations are zero-sized. But clearly it is not.
It's not a valid implementation since deallocation is broken. I don't get your examples, you are constructing broken implementations and then claiming that that shows an issue with the spec?
Like I said above, an allocator might decide to satisfy a zero-sized request with a non-zero sized slice or it might back a zero-sized slice by a non-zero-sized allocation. Nothing in the spec prevents this.
Sure, but if the allocator does this then it has to be able to also handle this situation during deallocation. If it cannot, it is a wrong implementation. No spec bug here that I can see.
If it is impossible to have an implementation that satisfies zero-sized requests in such a mixed way, then that's a consequence of the spec, but there is no reason for the spec to call this out explicitly. Is your complaint that the spec should be changed to make such implementations possible?
- Makes it impossible to write an allocator and show its correctness to a reasonable degree
- Makes it impossible to use an allocator and show that this usage is correct to a reasonable degree
Well, yeah, it's not a formal spec that can be used in a mathematical proof. Neither is yours. A formal spec should be written in some form of Hoare logic, and ideally in Separation Logic as it brings the best tools for talking about heaps.
- Allows using allocators in ways that make it impossible to implement even Global correctly
I don't understand what you mean by this.
All your objections, as far as I can see, have nothing at all to do with zero-sized allocations. They are all about specifying more precisely the ownership transfer that happens during allocation and deallocation: allocation returns exclusive ownership of the given region of memory to the caller, and deallocation consumes exclusive ownership of the given region of memory from the caller. The current spec tries to say this when defining "currently allocated". I agree that this is somewhat unsatisfying, but I disagree that your spec is the right fix -- it seems way overcomplicated to me. Just acknowledge that we work in an ownership-based world, and specify the interaction with the allocator as transferring ownership back and forth. The vast literature on separation logic shows that this is the right way to specifying allocators in a way that is amenable to proofs.
Before I answer your post in detail, let me put the central theme at the top:
Many of your arguments come down to "the implementation knows how it is implemented and therefore there is no problem".
But I claim that the user must know which memory blocks will be affected by a deallocate call. And the user can only rely on what the spec tells it about identifying memory blocks to do so because it might accept an allocator as a generic argument. The current spec does not allow the user to do so.
Ownership transfer is the right spec for an allocator, though. And it covers your case -- that's just also having ownership transfer from the user back to the allocator.
I disagree in the context of Rust. In Rust owned slices of memory cannot be split up into multiple owned parts. E.g. Box<[u8]>
cannot be partitioned into two Box<[u8]>
. Mutably borrowed slices can be split up &mut [u8] -> (&mut [u8], &mut [u8])
. Splitting up memory is a natural operation within an allocator. E.g. an allocator allocates a large chunk of memory from another allocator and then hands parts of this memory out upon request. Once all of these reborrows have ended, it is allowed to also end its own borrow of the large chunk. This is just like it works in compiler-checked Rust and therefore easy to understand.
Furthermore, the lifetime of an allocation is bound by the lifetime of an allocator. Once the last clone of an allocator is dropped, the slices can no longer be used. This, again, is just like it works with mutable borrows: Only once all borrows are dead can object that handed out these borrows be dropped.
Usually allocators have the static lifetime so ownership transfer might be a good fit there. But Rust's allocators are much more powerful.
See what @Amanieu wrote -- I was actually wrong in my statement here.
I saw what he wrote but he is wrong. An allocator cannot always recognize zero-sized allocations from its layout argument during deallocation. E.g.
let slice = allocate(length: 0, alignment: 1);
if slice.len() == 1 {
deallocate(slice, length: 1, alignment: 1); // this is valid
// or:
// deallocate(slice, length: 0, alignment: 1); // this is valid
}
We arrived at this requirement from the spec, so it clearly is required by the current spec.
The requirement does not follow from the spec. An allocator can simply internally store a hashmap of numerical pointer value -> all live allocations at that address including their layout. This allows an allocator to share the same addresses for non-zero-sized and zero-sized allocations.
The requirement of the spec (which is not even explicitly stated in the current spec) that we've been arguing about in this issue is that both the allocator and the user must be able to identify a unique "memory block" that will be affected by a deallocate call. Without this requirement, the spec of deallocate, which talks about the block that will be deallocated, is not well-defined.
Now, for the allocator this might be somewhat easy because it
But the user of the allocator can only rely on the spec of the trait. If the allocator can return the same output for the same allocate input multiple times (without deallocation in between) then the user cannot use the deallocate function because its behavior is not even well-defined.
I have no idea what it means for a spec to be "unsound".
Bad choice of words on my part. I meant that
deallocate
calls.The spec says nothing about how the implementation identifies the memory block. It leaves that up to the implementation. It is not the job of the spec to say such things.
That really makes no sense. If a function accepts a generic A: Allocator
, calls allocate
twice and deallocate
once, then it obviously must be able to know which memory block was deallocated. Furthermore, to even call deallocate
it must be able to identify which memory block will be affected because it must choose the layout parameter to fit that memory block. In this generic case, the only information the function has in order to do this is what it says in the spec. Therefore it is up to the spec to spell out how to to identify the memory block.
There's no bitwise-or operation on pointers. So you are talking about integer roundtrips here, and without a more precise model of int-to-ptr casts, I think it is futile to attempt to discuss here what their behavior is.
Seems pretty clear from the llvm spec what the intended behavior is:
A pointer value formed by an inttoptr is based on all pointer values that contribute (directly or indirectly) to the computation of the pointer’s value.
Either way any allocator spec must support deallocation with int-to-ptr arguments because those are widely used.
It's not a valid implementation since deallocation is broken. I don't get your examples, you are constructing broken implementations and then claiming that that shows an issue with the spec?
What do you mean by "deallocation is broken"? The implementation of deallocate can simply be an empty function. This would be perfectly valid. The only thing that is broken is that the user of the allocator cannot identify the block that will be affected. And it's up to the spec to allow the user to do this. Not the implementation.
Sure, but if the allocator does this then it has to be able to also handle this situation during deallocation. If it cannot, it is a wrong implementation. No spec bug here that I can see.
See the conclusion at the top of the post (which I wrote at this point.)
Is your complaint that the spec should be changed to make such implementations possible?
If zero-sized allocations were allowed then we should not impose restrictions that not necessary to have a spec that is well-defined and easy to reason about.
Well, yeah, it's not a formal spec that can be used in a mathematical proof. Neither is yours.
Of course I'm using many words that I haven't defined in a precise enough way to perform a mathematical proof. But you're being a bit black and white here. There is a scale from "no spec at all" to "computed assisted proof".
There currently is a spec, but, when I tried to write down for my allocator why the allocate and deallocate functions (which delegate to another allocator after some layout modification and pointer offsetting) were implemented correctly, I was not able to do so based on the words I found in the spec. Not to a degree that would satisfy me. I can still assume that my implementation is correct because I can assume what the people of wg-allocators intended to specify.
The spec I wrote down above, on the other hand, does give me the necessary words and guarantees to do so.
Allows using allocators in ways that make it impossible to implement even Global correctly
I don't understand what you mean by this.
I literally linked an example that showed two threads calling deallocate at once and claimed that this usage was fine according to the spec. You skipped that part.
All your objections, as far as I can see, have nothing at all to do with zero-sized allocations. They are all about specifying more precisely the ownership transfer that happens during allocation and deallocation: allocation returns exclusive ownership of the given region of memory to the caller, and deallocation consumes exclusive ownership of the given region of memory from the caller.
Let's not forget that dropping the last clone of the allocator also terminates the memory allocation. I think this is where the ownership-based model is most problematic because it's not really how the rest of Rust behaves. Once you own something, you can usually forget about where you got that something from.
The current spec tries to say this when defining "currently allocated". I agree that this is somewhat unsatisfying, but I disagree that your spec is the right fix -- it seems way overcomplicated to me.
Really? I think it's rather compact. Not only does it not have to define leaky abstractions like "memory block", but it also gets thread-safety for free without having to explicitly talk about it. Let me say once again that the current spec allows you to call deallocate the same allocation from two different threads concurrently. Ok, sure, just like with provenance, we can now go and tack on a few more sentences to the spec to disallow this. But then please don't say that my spec is overcomplicated when it delegates almost all of the complicated stuff to the C++ memory model which any programmer of unsafe, concurrent Rust must already know.
This is also where zero-sized allocations come into play. Because I don't see how you can get a spec as simple as this if you allow zero-sized allocations. I'd be happy to see one.
The vast literature on separation logic shows that this is the right way to specifying allocators in a way that is amenable to proofs.
I do not know about separation logic. Does the literature also deal with allocators that can be dropped and thereby invalidate all previous allocations?
This is also why most existing allocators don't support this
Requesting a zero-sized allocation works in most general purpose allocators -- what do you mean?
the C standard requires that either they return
null
, or a unique pointer (I think strictly speaking it says it's UB or IB, but other requirements of malloc require this).
With zero size, it is ID. It will be UB only for realloc
starting with C23.
But I claim that the user must know which memory blocks will be affected by a deallocate call. And the user can only rely on what the spec tells it about identifying memory blocks to do so because it might accept an allocator as a generic argument. The current spec does not allow the user to do so.
Due to the size argument, I don't see any issue with the user knowing which memory block will be deallocated. Could you give a concrete example that you think is ambiguous? (Apologies if you have already given it; there have been a few examples in this issue and so far it wasn't clear to me what you even were complaining about.)
It's not possible to implement this spec in the way it was intended and have that implementation be sound. E.g. what I talked about with concurrent deallocate calls.
Are you saying all the existing implementations are unsound? I very much doubt that. If you think the curent spec does not sufficiently spell out the synchronization requirements, I do not think that this is hard to fix. But I think two concurrent dealloc calls are already an incorrect use of the spec since there is no way for you to guarantee that the memory hasn't been deallocated yet, if there is a concurrent thread that might also deallocate it.
Due to the size argument, I don't see any issue with the user knowing which memory block will be deallocated. Could you give a concrete example that you think is ambiguous?
E.g. an allocator that returns
where both data pointers are identical. Note that, if the deallocate implementation is an empty function, there in no issue with making the implementation correct.
If the user then calls deallocate with the data pointer and a zero size, the user cannot determine which allocation was affected by the deallocate call.
The concrete example aside: The spec does not say how to identify a memory block at all. I'm using my pre-existing knowledge of allocators to infer that a memory block is "identified" by the data pointer returned by allocate. However, this interpretation has problems:
Are you saying all the existing implementations are unsound? I very much doubt that.
No. I'm saying that the implementation of Global
is unsound. I've linked the Global
breaking example above.
If you think the curent spec does not sufficiently spell out the synchronization requirements, I do not think that this is hard to fix.
I would like to see that.
But I think two concurrent dealloc calls are already an incorrect use of the spec since there is no way for you to guarantee that the memory hasn't been deallocated yet, if there is a concurrent thread that might also deallocate it.
The language of the spec is
the memory block has not been subsequently deallocated
I'm reading this as "no deallocate call has happened before this point." With this reading, concurrent deallocate calls are fine. You seem to be additionally inferring a total-ordering requirement for all operations that operate on the same memory block. This would prevent concurrent deallocate calls if we were able to identify memory blocks. But we can't in the zero-sized allocation case. So we would have to add even more language to exemt zero-sized allocations from this total-ordering requirement. And then we have to look at the degenerate example above and see what's going on theer.
To repeat myself:
From the C17 section on memory allocation:
If the size of the space requested is zero, the behavior is implementation-defined: either a null pointer is returned to indicate an error, or the behavior is as if the size were some nonzero value, except that the returned pointer shall not be used to access an object.
For purposes of determining the existence of a data race, memory allocation functions behave as though they accessed only memory locations accessible through their arguments and not other static duration storage. These functions may, however, visibly modify the storage that they allocate or deallocate. Calls to these functions that allocate or deallocate a particular region of memory shall occur in a single total order, and each such deallocation call shall synchronize with the next allocation (if any) in this order.
(Highlights mine.)
The second highlight explicitly requires that all operations that operate on a region of memory must happen in a total order. The notion of region of memory is clearly different from memory block in that
malloc
callsNote also the sentence before the second highlight which independently enforces a total order.
E.g. an allocator that returns
Nono, don't make any assumptions about the allocator. You said this is a problem for clients. So there should be a client that performs a sequence of allocate
and deallocate
calls (and potentially others) and the result is unclear when the only thing we know about the allocator is that it follows the spec. What is that client?
If the user then calls deallocate with the data pointer and a zero size, the user cannot determine which allocation was affected by the deallocate call.
Of course it can? Only the zero-sized allocation can possibly be affected?
No. I'm saying that the implementation of Global is unsound. I've linked the Global breaking example above.
If you mean the OP, then that's already fixed by saying that the deallocate
ptr must be derived from the ptr returned by allocate
. Or, more precisely, its provenance must be for the right allocation. IOW, that program has UB, it is not using the API correctly. I thought we were talking about what remains to be done once that change is made.
Nono, don't make any assumptions about the allocator. You said this is a problem for clients. So there should be a client that performs a sequence of
allocate
anddeallocate
calls (and potentially others) and the result is unclear when the only thing we know about the allocator is that it follows the spec. What is that client?
Sure (pseudocode):
fn f<A: Allocator>(a: A) {
let x = a.alloc(0);
let y = a.alloc(0);
if y.len() == 1 && y.as_ptr() == x.as_ptr() {
a.dealloc((y.as_ptr() as usize | x.as_ptr() as usize) as *mut u8, 0);
y[0] = 0; // valid or invalid?
// alternatively:
// a.dealloc((y.as_ptr() as usize | x.as_ptr() as usize) as *mut u8, 0); // valid or invalid?
}
}
The example I gave in the previous post showed that there is an Allocator that follows the spec where the if branch can be taken.
Of course it can? Only the zero-sized allocation can possibly be affected?
That's incorrect. Recall the definition of memory fitting:
/// * The provided [`layout.size()`] must fall in the range `min ..= max`, where:
/// - `min` is the size of the layout most recently used to allocate the block, and
/// - `max` is the latest actual size returned from [`allocate`], [`grow`], or [`shrink`].
Note that min = 0
in this case.
If you mean the OP, then that's already fixed by saying that the deallocate ptr must be derived from the ptr returned by allocate. Or, more precisely, its provenance must be for the right allocation. IOW, that program has UB, it is not using the API correctly. I thought we were talking about what remains to be done once that change is made.
No I meant the example with concurrent deallocations.
Maybe we can clear up some of our miscommunication here by discussing the following question:
If no zero-sized allocations are involved, does the 3-tuple of
identify at most one currently allocated memory block?
The answer is no but I think you're believing that the answer is yes. If you think that the answer is yes, I'll write code that shows that it's not.
(For zero-sized allocations the answer is obviously no.)
y.len() == 1
Wait, so you are testing for a case where you requested something of size 0 but actually got a non-zero sized allocation? I didn't realize that that's possible. @Amanieu I wonder what you think about this, since it breaks the argument that the allocator can use the size argument to distinguish these allocations.
In some sense it's the allocator's fault for putting itself into a situation where it cannot tell which allocation the user meant (even if we ignore all the provenance shenanigans, from a purely operational perspective), but if the user has to take this possibility into account, then any dealloc(x, 0)
poses the risk of "accidentally" deallocating some other memory block at the same location that also had requested size 0 but its real size is non-0 (so it could actually be in use with some data). We could add a requirement that there may never be multiple "currently allocated" blocks at the same address with overlapping sizes (i.e., the min ..= max
range of sizes accepted for deallocation of these blocks overlap, which means there could be ambiguity about which block is meant to be deallocated), except if all of them have size 0 (max == 0
). This would mean it is illegal to write an allocator where your if
branch is ever taken. No reasonable allocator would do this, right?
The example I gave in the previous post showed that there is an Allocator that follows the spec where the if branch can be taken.
You have more than a dozen posts in this thread, I have no idea which one you mean. Hyperlinks are a great invention for cases where you actually want others to know for sure, without guessing, what thing you are referring to. ;)
No I meant the example with concurrent deallocations.
I cannot find a code example with concurrent deallocations. I won't re-read the entire thread to figure out what you mean. Please, if you expect others to follow your arguments, make it easy for them by providing links to the data you are using to support your arguments.
If no zero-sized allocations are involved, does the 3-tuple of [...] identify at most one currently allocated memory block?
I'd say it certainly does, even without provenance -- how else would one even implement deallocation for this allocator? I assume you are thinking about concurrency again here?
I feel like we're having two different discussions in this thread, and I never know which discussion you are even responding to. It would help to keep different, independent discussions (such as your concern around concurrency, and your concern around zero-sized allocations) in separate issues.
Yes, an allocator can return a larger size than what was requested. This allows Vec
to use all of the available memory on allocators that round sizes up to the next size class.
Right, what I was asking about are the consequences for zero-sized allocations, like in @mahkoh's example. However, thinking about it some more, I think I view this example as being similar to the following:
fn f<A: Allocator>(a: A) {
let x = a.alloc(1);
let y = a.alloc(1);
if y.len() == 1 && y.as_ptr() == x.as_ptr() {
a.dealloc((y.as_ptr() as usize | x.as_ptr() as usize) as *mut u8, 1);
y[0] = 0; // valid or invalid?
// alternatively:
// a.dealloc((y.as_ptr() as usize | x.as_ptr() as usize) as *mut u8, 1); // valid or invalid?
}
}
Here, too, it would be unclear which allocation is deallocated by the first a.dealloc
. However, this is not actually a problem since that code is clearly unreachable. So, I am now fairly sure that we want to arrange things in a way that likewise, the dealloc
in @mahkoh's example is unreachable. There can never be two allocations with the same base address unless both of them have size 0 -- and this refers to the actual size of the allocation, not the requested size.
I am not sure if this requires any change to the allocator spec -- in principle, whatever wording lets us conclude that the dealloc
is unreachable in my example, should be sufficient to conclude the same thing about @mahkoh's example. Currently, the allocate
spec does not say anything explicit about freshness, maybe it should? Something like
The returned block will not overlap with any other memory block that is currently allocated. Furthermore, it will not have the same address as any other memory block that is currently allocated, except possibly if both blocks have size 0.
There can never be two allocations with the same base address unless both of them have size 0 -- and this refers to the actual size of the allocation, not the requested size.
That's not sufficient if we want to allow allocators to conjure a base address out of thin air for ZSTs (e.g. by casting the alignment to a pointer), since there could be a real allocation (with >0 requested size) at that address. The allocator can still differentiate these cases since the real allocation must be deallocated with a non-zero size.
That's not sufficient if we want to allow allocators to conjure a base address out of thin air for ZSTs (e.g. by casting the alignment to a pointer), since there could be a real allocation (with >0 requested size) at that address. The allocator can still differentiate these cases since the real allocation must be deallocated with a non-zero size.
Oh, right. So it's more like: there can never be two allocations with the same base address except if one has size zero, and the other one
You have more than a dozen posts in this thread, I have no idea which one you mean.
"Previous" referring to the post that is the maximum of all of my posts in this thread that occurred before the post in which I wrote "previous" in the total order induced by the GitHub UI. This should be rather clear from the context. But, since you didn't reply for a long time, this might have been less clear to you than to someone who read the thread from top to bottom.
Anyway, this post is of no consequence now because you've already agreed that it's possible for the if branch to be taken.
I cannot find a code example with concurrent deallocations. I won't re-read the entire thread to figure out what you mean. Please, if you expect others to follow your arguments, make it easy for them by providing links to the data you are using to support your arguments.
I'm sorry but you already referenced concurrent deallocations yourself in this thread. You cannot expect others to write all of their posts is a way that they make sense to people who only read that one post or have forgotten what was already discussed.
Here I discussed concurrency issues with the current spec. In your following reply you did not even address this point. So it makes sense that you do not remember it. You possibly didn't even read it the first time.
The fact that you did not know that allocators are allowed to return slices with arbitrary lengths at least as large as what was requested tells me that you did not read the current spec carefully, if at all. If you do not have time or interest to take part in this discussion then that is completely fine. But please do not make opinionated posts without putting in the necessary work to understand what is being said and discussed. Because your opinions count for a lot.
I'd say it certainly does, even without provenance -- how else would one even implement deallocation for this allocator? I assume you are thinking about concurrency again here?
No this has nothing to do with concurrency. Consider the following code which is somewhat simplified and assumes alignment = 1
in all requests:
type A: Allocator = ...;
fn g(a: &A, m: &mut [MaybeUninit<u8>], f: impl FnOnce()) {
...
}
fn f(a: &A) {
let s = a.allocate(1).unwrap();
g(a, transmute(s), || {
let s2 = a.allocate(1).unwrap();
if s.as_ptr() == s2.as_ptr() && s.len() == s2.len() {
a.deallocate(s);
}
});
a.deallocate(s);
}
Do you agree that, if such an A
and g
can be written and the if branch can be taken, then we have a problem?
Here is a short sketch, if that is not enough, then I can write a full implementation:
struct A {
pool: Vec<(*mut [u8], bool)>, // (slice, in use)
}
impl !Sync for A { }
impl Allocator for A {
fn allocate(&self, req: usize) {
if (request can be satisfied from pool) {
satifsy from pool and mark as in use
} else {
use global allocator
}
}
fn deallocate(&self, s: *mut u8, len: usize) {
if (address of s is in the pool) {
mark entry as unused
} else {
use global allocator
}
}
}
impl A {
fn feed(&self, s: *mut [u8]) {
if (address of s is already in the pool) {
panic!();
}
add s to the pool as unused
}
fn return_fed(&self, s: *mut [u8]) {
if (address of s is not in the pool or s is in use) {
panic!();
}
remove s from the pool
}
}
fn g(a: &A, m: &mut [MaybeUninit<u8>], f: impl FnOnce()) {
a.feed(transmute(m));
f();
a.return_fed(transmute(m));
}
This seems to be a perfectly fine allocator.
(Wtf, GH first duplicated my response and now it is acting up in weird ways... looks like they screwed up their database consistency...)
"Previous" referring to the post that is the maximum of all of my posts in this thread that occurred before the post in which I wrote "previous" in the total order induced by the GitHub UI. This should be rather clear from the context. But, since you didn't reply for a long time, this might have been less clear to you than to someone who read the thread from top to bottom.
That would be https://github.com/rust-lang/wg-allocators/issues/82#issuecomment-753666708, which contains no "example". So no, what you are saying is simply not accurate.
I'm sorry but you already referenced concurrent deallocations yourself in this thread. You cannot expect others to write all of their posts is a way that they make sense to people who only read that one post or have forgotten what was already discussed.
You brought it up though, thereby significantly raising the confusion level of this entire thread. I just responded to (some of) your statements.
I can expect people to write in a way that makes it as easy as reasonably possible to follow. This includes not jumping between different topics without even making those jumps clear, it includes not assuming that the reader has perfect memory, and it includes putting in hyperlinks where appropriate. This is just basic courtesy -- if you want others to make the effort to follow your train of thought, then you should put in some effort to make that easier.
The fact that you did not know that allocators are allowed to return slices with arbitrary lengths at least as large as what was requested tells me that you did not read the current spec carefully, if at all. If you do not have time or interest to take part in this discussion then that is completely fine. But please do not make opinionated posts without putting in the necessary work to understand what is being said and discussed. Because your opinions count for a lot.
Yes, I should have read about the status quo better. Sorry for this.
This seems to be a perfectly fine allocator.
Okay, so this is yet another problem that you also mixed into this thread -- it's about transferring ownership of some of the allocated memory back to the allocator, and then having that allocator hand out the same memory again. That's... quite subtle, and I have no idea if the allocator-wg considers such allocators "in scope". It's certainly an interesting example, but in the interest of sticking to "one topic per thread", I think it would be better if you opened yet another new issue for this one.
Can we agree to only discuss the ZST issue here henceforth? If you want to discuss the concurrency once, please open a new thread. (I'm not a WG member though so I have no say to what happens in this repo. I am just making suggestions that I think are in the best interesting of having a clear and constructive discussion.)
These topics are not quite as separated as you are making them out to be:
allocate
and deallocate
calls that return/accept the same address are in a total order and therefore deallocate
always acts on a allocate
call (!), not on a memory block. As long as all calls are properly paired, there is no problem with the same address being returned multiple times.Therefore I know that these problems can be solved if we disallow zero-sized allocations. I do not know if they can be solved with zero-sized allocations allowed. Therefore, concurrency issues and creative allocator uses serve as a motivator for disallowing zero-sized allocations.
That would be #82 (comment), which contains no "example". So no, what you are saying is simply not accurate.
You got me there.
Just because your proposal solves all 3 issues at once, doesn't mean they should be discussed all at once. In particular, first it needs to be established that there are three problems, and those are certainly three separate discussions.
I appreciate you consider these all closely related based on solving them all together, and separating them might feel like a step backwards to you, but there's no point in insisting on this until your audience agrees with this premise -- and convincing others of this requires separate discussion of the three issues you have with the current spec. It makes little sense to talk about your proposed spec until there is agreement about the issues that it is designed to solve (or at least, any such discussion would be improperly motivated -- that's why I haven't spent the time to dig into your spec yet; I am still in the state of needing to be convinced that such a spec is even necessary). Also understanding it will be much easier when the issues that it is designed to be solved are all clearly on the table. I think by posting your solution before clearly establishing the problems, you are putting the cart before the horse.
- For the concurrency issue, I think it suffices to say that all allocator operations are considered non-atomic writes, and therefore requires external synchronization to not be considered data races.
That's exactly what my spec does. But I don't know what you mean by non-atomic writes if the slice is empty. That's the reason I had to disallow zero-sized allocations in my spec.
- For "re-feeding" memory to the allocator, that sounds like the WG should have a discussion if they want to support this kind of allocator
I'd be more concerned about creative allocators that haven't yet been discovered and that, once discovered, will cause other implicit assumptions made by the spec to no longer hold.
I have no particular interest in re-feeding allocators except insofar as that they serve as an example that uncovers such implicit assumptions. In this case that (allocator, addr, size) uniquely identify an allocation.
The reason I opened this thread in the first place is that I found the spec to be underspecified, i.e., to be full of implicit assumptions.
But I don't know what you mean by non-atomic writes if the slice is empty. That's the reason I had to disallow zero-sized allocations in my spec.
Yes, that's a good question. I think it'd be best to explore this with a few concrete examples. IOW, I think this needs some room for discussion and exploring the design space of possible solutions -- room such as an issue dedicated to this problem.
The documentation talks about memory blocks in several places.
and later
and later
and later
Combined, all of these require that pointers returned by an Allocator are abstract and cannot be compared. Or else no two allocations must return the same value.
The key problem is that the API allows zero sized allocations. Requests for zero sized allocations can return the same pointer multiple times or can return a pointer which compares equal to the pointer returned by a non-zero-sized allocation.
Global
already has these properties.The easiest fix would be to disallow zero-sized allocations.
@RalfJung This seems right up your alley.