Closed Aaron1011 closed 1 year ago
cc @RalfJung
On first glance this looks like a duplicate of https://github.com/rust-lang/unsafe-code-guidelines/issues/148. Can you confirm?
Does it make sense to consider this kind of behavior UB, given that stable_deref_trait and owning-ref (and probably other crates as well) rely on it?
Finding trade-offs like this is exactly why we have Stacked Borrows implemented in Miri. If this is really a duplicate of the above, it will be very interesting to figure out which optimizations it will cost us to support code like this. Maybe we end up needing some kind of annotation for self-referential structs.
I don't see anything self-referential here. The code is just taking a pointer to the contents of the Box and expects this pointer to remain usable as the Box itself is moved around.
So the Box<T>
contains a Unique<T>
pointer, asserting that all accesses to the underlying object are done through that pointer or something derived from it, and the example breaks that assumption.
If I modify the example to involve mutation, this can actually cause miscompilation:
use std::cell::Cell;
fn helper(val: Box<Cell<u8>>, ptr: *const Cell<u8>) -> u8 {
val.set(10);
unsafe { (*ptr).set(20); }
val.get()
}
fn main() {
let val: Box<Cell<u8>> = Box::new(Cell::new(25));
let ptr: *const Cell<u8> = &*val;
let res = helper(val, ptr);
assert_eq!(res, 20);
}
At the time of writing, the assertion passes in debug mode but fails in release mode.
I believe the modified example still corresponds to what owning-ref
does, but I'm not sure as I haven't used that crate myself.
What's happening is that the val
parameter to helper
is being marked noalias
. I'm using Cell
to ensure I'm doing "as much as I can" to assert non-uniqueness, but it doesn't actually make a difference: Cell
disables noalias
for shared references, but not for unique references.
Surprisingly, noalias
is applied even without -Z mutable-noalias=yes
; looking at the compiler code, mutable-noalias
only affects actual &mut
references, even though Box
is also a mutable pointer. This is problematic, since I don't see why Box
would be immune to the LLVM bugs that made us default to mutable-noalias=no
.
In this example, though, the misoptimization happens because we've violated the noalias
invariant, not because of an LLVM bug.
Edit: Interestingly, my example does not trigger a miri error.
I looked into it, and sure enough, you can write an equivalent program without unsafe code using the APIs provided by owning_ref
: playground link.
It should be possible for owning_ref
to fix this by not storing the Box
directly in the OwningRef
struct, but instead storing a raw pointer that represents it, using into_raw
/from_raw
as appropriate. However, doing that for all supported types would require redesigning its StableDeref
trait.
I definitely think it's the same underlying reasoning that would make this and self-referential (i.e. #148) sound.
It boils down to:
&impl StableDeref<Target=T>
*const T
and store it&impl StableDeref<Target=T>
again*const T
to &T
The problem is that this is a completely implicit reborrowing, effectively from just having the &impl StableDeref<Target=T>
in scope.
This is the exact behavior that StableDeref
purports to provide:
More specifically, implementors must ensure that the result of calling deref() is valid for the lifetime of the object, not just the lifetime of the borrow, and that the deref is valid even if the object is moved. [...] Basically, it must be valid to convert the result of deref() to a pointer, and later dereference that pointer, as long as the original object is still live, even if it has been moved or &self methods have been called on it.
Pin
and self-referential structures through Pin
use a mostly equivalent guarantee to allow observation of Pin<P>
to "re-validate" stored references.
The result is that Pin<impl Deref<Target=T>>
/impl StableDeref<Target=T>
aren't fully noalias
, because they guarantee that it's valid to use raw pointers derived from their Deref
implementation beyond the lifetime provided by said Deref
. I think the only way that these types that naively "should" be noalias
can be noalias
is if the owning Pin
/impl StableDeref
is consulted to "upgrade" these "swindled" references to valid ones again, allowing "re-deriving" them and changing their parent such that they are children of the valid noalias
pointer.
Another update: It's also possible to violate the noalias
condition with just async
/await
, similar to the example in #148. But for arcane reasons, it seems almost impossible to actually trigger miscompilation, even with -Z mutable-noalias=yes
.
This is all you need to violate the condition:
pub async fn test() -> i32 {
let a: Cell<i32> = Cell::new(0);
let mut b: &Cell<i32> = &a;
SomeFuture.await;
b.set(100);
a.get()
}
It's desugared into something like
struct Test {
state: ?,
a: Cell<i32>,
b: *const Cell<i32>,
}
impl Future for Test {
fn poll(self: Pin<&mut Self>, ...) {
loop {
match self.state {
State1 => {
self.b = &self.a;
...poll on SomeFuture...
},
State2 => {
(*self.b).set(100);
Poll::Ready(self.a.get())
},
}
}
}
}
Here, (*self.b).set(100)
writes to self.a
, but if we enter poll
with self.state == State2
, the value of self.b
is not directly or indirectly 'based on' self
. Thus the write violates the noalias
condition for self
.
On the other hand, if we enter poll
in the initial state and the poll is ready on the first try, then we haven't violated the condition, because the value of self.b
is 'based on' self
(because we wrote it as part of the same function call). Thus, LLVM can only assume no-aliasing in State2 if it knows that the poll won't be ready on the first try. Because of quirks of the await implementation (use of thread-local storage), I wasn't able to come up with a proof-of-concept that allowed LLVM to prove this, even using extremely contrived Future
implementations. There's probably some way to do it, but the issue is unlikely to affect actual async code.
I was sort of hoping to drop a bombshell that might affect async/await stabilization, but no such luck. ;)
For a long-term solution, we need to avoid marking self
as noalias
in this case. Perhaps, as @CAD97 suggested, Pin<&mut T>
should just never be noalias
...
@comex I think it is worth mentioning that in your example here helper
has as a pre-condition that the pointer is valid to dereference and that it does not alias the Box<T>
(a Box<T>
owns T
and using it is like using a &mut T
) . The helper
function should therefore be unsafe
, and the calling code violates its pre-conditions, so the program has undefined behavior, and gets mis-optimized.
I don't see anything self-referential here. The code is just taking a pointer to the contents of the Box and expects this pointer to remain usable as the Box itself is moved around.
Oh, I forgot what owning-ref does. The pointer into the box lives next to the box, not inside it. I was also confused by "the box is moved", because the box has to stay in place for the pointer to remain valid. But the Box
pointer is moved -- this is the kind of "move" that is allowed even when something is pinned.
However, in terms of Stacked Borrows, it doesn't matter where the pointer is stored, whether inside the box or outside of it. What matters is that an inner pointer gets created, then an outer pointer gets used (for a move of a box or a reborrow of a mutable reference), and then the old inner pointer gets used again. So the pattern is the same.
@comex
So the Box
contains a Unique pointer, asserting that all accesses to the underlying object are done through that pointer or something derived from it, and the example breaks that assumption.
Note that right now, Stacked Borrows special-cases Box
but not Unique
.
Edit: Interestingly, my example does not trigger a miri error.
Nice example! Also the one for async
/await
.
Interesting indeed. Probably this is because all raw pointers are equal for Miri (at least currently; I hope to change this eventually but without &raw
that's not possible), and there is a raw pointer being created inside val.set(10)
.
@CAD97
The result is that Pin<impl Deref
>/impl StableDeref aren't fully noalias, because they guarantee that it's valid to use raw pointers derived from their Deref implementation beyond the lifetime provided by said Deref.
Exactly.
Maybe such things "just" need to be specially marked to the compiler.
@gnzlbg
The helper function should therefore be unsafe, and the calling code violates its pre-conditions, so the program has undefined behavior, and gets mis-optimized.
The compiler doesn't know about the preconditions, so violating the preconditions never explains UB (from an abstract machine perspective).
The reason it is UB is that aliasing constraints are violated, and as @comex demonstrated owning-ref can do the same.
So should we close this now in favor of https://github.com/rust-lang/unsafe-code-guidelines/issues/148? I think we established that they are definitely closely related.
The compiler doesn't know about the preconditions, so violating the preconditions never explains UB (from an abstract machine perspective).
Of course it does? All Box
function arguments are noalias
because the compiler does know about this precise pre-condition (EDIT: that no other argument to the function can alias them, e.g., example (godbolt)).
I think it is just ontologically wrong to say "this is UB because the precondition is violated", the IMO more correct view is "this is a precondition because otherwise there is UB" (and the UB is caused by the aliasing model that's part of the abstract machine).
Following your argument "the calling code violates its pre-conditions, so the program has undefined behavior", calling Vec::set_len
the wrong way causes UB. It does not. That's all I am saying. I think this is just sloppy terminology on your side but I wanted to avoid people reading this interpreting it the wrong way.
I claimed that:
helper
has a pre-condition that its callers must upheld,helper
doesn't check its pre-condition and therefore the function should be unsafe,I'm not sure how you end up arguing that there are pre-conditions that can be violated without invoking undefined behavior (e.g. Vec::set_len
). I don't disagree with that, and I don't think I claimed otherwise.
I did not make any claims about the source of the undefined behavior, I certainly did not intend to make the tautological claim that this is UB because helper has a precondition due to this being UB.
When I read the thread, I got the impression from @comex comment that this might be a mis-optimization. I just wanted to point out that this is intended behavior.
In the case of OwningRef<Box<T>>
, this could be solved if the implementation decided to replace the input Box<T>
given at construction by a NonNull<UnsafeCell<T>>
(the NonNull
to opt out of uniqueness; the UnsafeCell
to be able to temporarily upgrade back into it).
More generally, for a pointer P : Deref
, we will need to also have IntoRawNonNull + FromRawNonNull
to be able to ensure opting out of uniqueness.
I don't know how this would generalise toPin
, though (at the end of the day, we miss the PinMut
abstraction).
@gnzlbg
The helper function should therefore be unsafe, and the calling code violates its pre-conditions, so the program has undefined behavior, and gets mis-optimized.
This phrasing heavily implied a general entailment of the form "precondition violated implies UB". Specifically, the "so" in your sentence indicates a causal connection. If you didn't mean to say that, take this as feedback for trying a clearer wording next time. Maybe I am nitpicking too much here, but I think it is important that we use clear language. The next time you write "A, so B" I need to know if this indicates a causal connection or not.
For @comex' example, it also doesn't really matter whether that function is marked unsafe
or not as we were only considering its operational behavior.
When I read the thread, I got the impression from @comex comment that this might be a mis-optimization. I just wanted to point out that this is intended behavior.
Ah, the term "miscompilation" is the culprit here. Do we have a good word for "code that has UB that the compiler actually exploits", in contrast to "code that has UB but still works as intended by the programmer"? We often say "miscompilation" but that sounds like a compiler bug, which it isn't if the code really does have UB.
As far as I can tell, there was never any disagreement that that code is wrong according to our current aliasing rules, and so the compiler is in its right to "miscompile". @comex was not trying to argue the UB away. The point of that comment was that this code does the exact same thing as what you can do with owning-ref using only safe code. So either owning-ref is wrong or our aliasing rules are wrong.
Cc @Kimundi (owning-ref author); I also reported this upstream.
at the end of the day, we miss the PinMut abstraction
We might... :/
FWIW, we've attempted to address this in ndarray, where it is using a Vec for allocation/deallocation (partly historical purposes, now for zero-copy transfer of data between Array and Vec).
The description of the problem is this:
The owned array has used a Vec
that it owns, and kept a separate "head pointer" ptr on the side; the head pointer points to the first element in the array, somewhere inside the vector's data.
and solution this:
We replace the Vec with a deconstructed vector; it's equivalent but uses NonNull
for its pointer, so it's an aliasable owner.
It feels like a non-issue, because I can't find anywhere that we "interleave" read/writes through the Vec with reads/writes through our own "head pointer". The only places where we write to the Vec (for example in Clone::clone_from
), afterwards we immediately refresh our head pointer, without using the old value. But it seems like the right thing to do, to avoid using a Vec with its Unique<T>
.
It feels like a non-issue, because I can't find anywhere that we "interleave" read/writes through the Vec with reads/writes through our own "head pointer".
To clarify, you are saying this code should be fine and Stacked Borrows should not complain, for the reasons given?
If this code is the ndarray code in question: Yes, but I don't know the model that well, so it's just a hunch.
I still feel that if people / libraries (i.e., I am not talking about the compiler-generated unsugaring for generators and the Pin
interaction) want to "alias" a Box
's pointee, then they should downgrade the Box
to a different type (which could be in std, that's a different topic) which would not include the Unique
property: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=039a3ae5d241eeee37b0f8bc40446f6a
That is, a Box
that only assumes Unique
-ness when DerefMut
(which in my example requires an explicit method call, as a "lint") and obviously when dropped.
Otherwise, its pointee can be aliased, and depending on the usage, either by a pointer that assumes immutability, so as long as the Box
itself never mutates the pointee (should be enough for ::owning_ref
), or one that could even support the Box
mutating the pointee, so as long as it does not assume immutability (and is not used for mutation itself).
Pin
, maybe Pin<Box< _ >>
could be such a pointer? (my wavy-hands intuition: if "the memory location itself owns the pointee", then it makes no sense for the Box
to assume Unique
-ness anymore).We have separate issues to discuss the design of MaybeDangling
and to discuss the open questions around Box
aliasing requirements. Closing.
When running Miri on owning-ref, I ran into an error with this (minimized) snippet (playground):
which errors with:
In this snippet, we're creating a raw pointer into a Box, moving the Box, then dereferencing the raw pointer. The owning-ref crate (and more generally, anything relying on stable_deref_trait) relies on this working. In fact, this is the entire reason for the existence of
stable_deref_trait
.The error message is a little confusing. However, I believe that the read from
ptr
causes theUnique
item on the stack to be disabled (since theUnique
is above theSharedReadOnly
that grants access).Does it make sense to consider this kind of behavior UB, given that
stable_deref_trait
andowning-ref
(and probably other crates as well) rely on it?