rust-lang / unsafe-code-guidelines

Forum for discussion about what unsafe code can and can't do
https://rust-lang.github.io/unsafe-code-guidelines
Apache License 2.0
666 stars 58 forks source link

Stacked Borrows vs inner pointers into "stable-deref" types #194

Closed Aaron1011 closed 1 year ago

Aaron1011 commented 5 years ago

When running Miri on owning-ref, I ran into an error with this (minimized) snippet (playground):

fn main() {
    let val: Box<u8> = Box::new(25);
    let ptr: *const u8 = &*val;

    let _moved_val = val;
    let _my_val: u8 = unsafe { *ptr };
}

which errors with:

error[E0080]: Miri evaluation error: trying to reborrow for SharedReadOnly, but parent tag <1359> does not have an appropriate item in the borrow stack
   --> /root/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/liballoc/alloc.rs:219:28
    |
219 |     let size = size_of_val(&*ptr);
    |                            ^^^^^ Miri evaluation error: trying to reborrow for SharedReadOnly, but parent tag <1359> does not have an appropriate item in the borrow stack
    |

In this snippet, we're creating a raw pointer into a Box, moving the Box, then dereferencing the raw pointer. The owning-ref crate (and more generally, anything relying on stable_deref_trait) relies on this working. In fact, this is the entire reason for the existence of stable_deref_trait.

The error message is a little confusing. However, I believe that the read from ptr causes the Unique item on the stack to be disabled (since the Unique is above the SharedReadOnly that grants access).

Does it make sense to consider this kind of behavior UB, given that stable_deref_trait and owning-ref (and probably other crates as well) rely on it?

Aaron1011 commented 5 years ago

cc @RalfJung

RalfJung commented 5 years ago

On first glance this looks like a duplicate of https://github.com/rust-lang/unsafe-code-guidelines/issues/148. Can you confirm?

Does it make sense to consider this kind of behavior UB, given that stable_deref_trait and owning-ref (and probably other crates as well) rely on it?

Finding trade-offs like this is exactly why we have Stacked Borrows implemented in Miri. If this is really a duplicate of the above, it will be very interesting to figure out which optimizations it will cost us to support code like this. Maybe we end up needing some kind of annotation for self-referential structs.

hanna-kruppe commented 5 years ago

I don't see anything self-referential here. The code is just taking a pointer to the contents of the Box and expects this pointer to remain usable as the Box itself is moved around.

comex commented 5 years ago

So the Box<T> contains a Unique<T> pointer, asserting that all accesses to the underlying object are done through that pointer or something derived from it, and the example breaks that assumption.

If I modify the example to involve mutation, this can actually cause miscompilation:

Playground link

use std::cell::Cell;

fn helper(val: Box<Cell<u8>>, ptr: *const Cell<u8>) -> u8 {
    val.set(10);
    unsafe { (*ptr).set(20); }
    val.get()
}

fn main() {
    let val: Box<Cell<u8>> = Box::new(Cell::new(25));
    let ptr: *const Cell<u8> = &*val;
    let res = helper(val, ptr);
    assert_eq!(res, 20);
}

At the time of writing, the assertion passes in debug mode but fails in release mode.

I believe the modified example still corresponds to what owning-ref does, but I'm not sure as I haven't used that crate myself.

What's happening is that the val parameter to helper is being marked noalias. I'm using Cell to ensure I'm doing "as much as I can" to assert non-uniqueness, but it doesn't actually make a difference: Cell disables noalias for shared references, but not for unique references.

Surprisingly, noalias is applied even without -Z mutable-noalias=yes; looking at the compiler code, mutable-noalias only affects actual &mut references, even though Box is also a mutable pointer. This is problematic, since I don't see why Box would be immune to the LLVM bugs that made us default to mutable-noalias=no.

In this example, though, the misoptimization happens because we've violated the noalias invariant, not because of an LLVM bug.

Edit: Interestingly, my example does not trigger a miri error.

comex commented 5 years ago

I looked into it, and sure enough, you can write an equivalent program without unsafe code using the APIs provided by owning_ref: playground link.

It should be possible for owning_ref to fix this by not storing the Box directly in the OwningRef struct, but instead storing a raw pointer that represents it, using into_raw/from_raw as appropriate. However, doing that for all supported types would require redesigning its StableDeref trait.

CAD97 commented 5 years ago

I definitely think it's the same underlying reasoning that would make this and self-referential (i.e. #148) sound.

It boils down to:

The problem is that this is a completely implicit reborrowing, effectively from just having the &impl StableDeref<Target=T> in scope.

This is the exact behavior that StableDeref purports to provide:

More specifically, implementors must ensure that the result of calling deref() is valid for the lifetime of the object, not just the lifetime of the borrow, and that the deref is valid even if the object is moved. [...] Basically, it must be valid to convert the result of deref() to a pointer, and later dereference that pointer, as long as the original object is still live, even if it has been moved or &self methods have been called on it.

Pin and self-referential structures through Pin use a mostly equivalent guarantee to allow observation of Pin<P> to "re-validate" stored references.

The result is that Pin<impl Deref<Target=T>>/impl StableDeref<Target=T> aren't fully noalias, because they guarantee that it's valid to use raw pointers derived from their Deref implementation beyond the lifetime provided by said Deref. I think the only way that these types that naively "should" be noalias can be noalias is if the owning Pin/impl StableDeref is consulted to "upgrade" these "swindled" references to valid ones again, allowing "re-deriving" them and changing their parent such that they are children of the valid noalias pointer.

comex commented 5 years ago

Another update: It's also possible to violate the noalias condition with just async/await, similar to the example in #148. But for arcane reasons, it seems almost impossible to actually trigger miscompilation, even with -Z mutable-noalias=yes.

This is all you need to violate the condition:

pub async fn test() -> i32 {
    let a: Cell<i32> = Cell::new(0);
    let mut b: &Cell<i32> = &a;
    SomeFuture.await;
    b.set(100);
    a.get()
}

It's desugared into something like

struct Test {
    state: ?,
    a: Cell<i32>,
    b: *const Cell<i32>,
}
impl Future for Test {
    fn poll(self: Pin<&mut Self>, ...) {
        loop {
            match self.state {
                State1 => {
                    self.b = &self.a;
                    ...poll on SomeFuture...
                },
                State2 => {
                    (*self.b).set(100);
                    Poll::Ready(self.a.get())
                },
            }
        }
    }
}

Here, (*self.b).set(100) writes to self.a, but if we enter poll with self.state == State2, the value of self.b is not directly or indirectly 'based on' self. Thus the write violates the noalias condition for self.

On the other hand, if we enter poll in the initial state and the poll is ready on the first try, then we haven't violated the condition, because the value of self.b is 'based on' self (because we wrote it as part of the same function call). Thus, LLVM can only assume no-aliasing in State2 if it knows that the poll won't be ready on the first try. Because of quirks of the await implementation (use of thread-local storage), I wasn't able to come up with a proof-of-concept that allowed LLVM to prove this, even using extremely contrived Future implementations. There's probably some way to do it, but the issue is unlikely to affect actual async code.

I was sort of hoping to drop a bombshell that might affect async/await stabilization, but no such luck. ;)

For a long-term solution, we need to avoid marking self as noalias in this case. Perhaps, as @CAD97 suggested, Pin<&mut T> should just never be noalias...

gnzlbg commented 5 years ago

@comex I think it is worth mentioning that in your example here helper has as a pre-condition that the pointer is valid to dereference and that it does not alias the Box<T> (a Box<T> owns T and using it is like using a &mut T) . The helper function should therefore be unsafe, and the calling code violates its pre-conditions, so the program has undefined behavior, and gets mis-optimized.

RalfJung commented 5 years ago

I don't see anything self-referential here. The code is just taking a pointer to the contents of the Box and expects this pointer to remain usable as the Box itself is moved around.

Oh, I forgot what owning-ref does. The pointer into the box lives next to the box, not inside it. I was also confused by "the box is moved", because the box has to stay in place for the pointer to remain valid. But the Box pointer is moved -- this is the kind of "move" that is allowed even when something is pinned.

However, in terms of Stacked Borrows, it doesn't matter where the pointer is stored, whether inside the box or outside of it. What matters is that an inner pointer gets created, then an outer pointer gets used (for a move of a box or a reborrow of a mutable reference), and then the old inner pointer gets used again. So the pattern is the same.

RalfJung commented 5 years ago

@comex

So the Box contains a Unique pointer, asserting that all accesses to the underlying object are done through that pointer or something derived from it, and the example breaks that assumption.

Note that right now, Stacked Borrows special-cases Box but not Unique.

Edit: Interestingly, my example does not trigger a miri error.

Nice example! Also the one for async/await.

Interesting indeed. Probably this is because all raw pointers are equal for Miri (at least currently; I hope to change this eventually but without &raw that's not possible), and there is a raw pointer being created inside val.set(10).

@CAD97

The result is that Pin<impl Deref>/impl StableDeref aren't fully noalias, because they guarantee that it's valid to use raw pointers derived from their Deref implementation beyond the lifetime provided by said Deref.

Exactly.

Maybe such things "just" need to be specially marked to the compiler.

@gnzlbg

The helper function should therefore be unsafe, and the calling code violates its pre-conditions, so the program has undefined behavior, and gets mis-optimized.

The compiler doesn't know about the preconditions, so violating the preconditions never explains UB (from an abstract machine perspective).

The reason it is UB is that aliasing constraints are violated, and as @comex demonstrated owning-ref can do the same.


So should we close this now in favor of https://github.com/rust-lang/unsafe-code-guidelines/issues/148? I think we established that they are definitely closely related.

gnzlbg commented 5 years ago

The compiler doesn't know about the preconditions, so violating the preconditions never explains UB (from an abstract machine perspective).

Of course it does? All Box function arguments are noalias because the compiler does know about this precise pre-condition (EDIT: that no other argument to the function can alias them, e.g., example (godbolt)).

RalfJung commented 5 years ago

I think it is just ontologically wrong to say "this is UB because the precondition is violated", the IMO more correct view is "this is a precondition because otherwise there is UB" (and the UB is caused by the aliasing model that's part of the abstract machine).

Following your argument "the calling code violates its pre-conditions, so the program has undefined behavior", calling Vec::set_len the wrong way causes UB. It does not. That's all I am saying. I think this is just sloppy terminology on your side but I wanted to avoid people reading this interpreting it the wrong way.

gnzlbg commented 5 years ago

I claimed that:

I'm not sure how you end up arguing that there are pre-conditions that can be violated without invoking undefined behavior (e.g. Vec::set_len). I don't disagree with that, and I don't think I claimed otherwise.

I did not make any claims about the source of the undefined behavior, I certainly did not intend to make the tautological claim that this is UB because helper has a precondition due to this being UB.

When I read the thread, I got the impression from @comex comment that this might be a mis-optimization. I just wanted to point out that this is intended behavior.

danielhenrymantilla commented 5 years ago

In the case of OwningRef<Box<T>>, this could be solved if the implementation decided to replace the input Box<T> given at construction by a NonNull<UnsafeCell<T>> (the NonNull to opt out of uniqueness; the UnsafeCell to be able to temporarily upgrade back into it).

More generally, for a pointer P : Deref, we will need to also have IntoRawNonNull + FromRawNonNull to be able to ensure opting out of uniqueness.

I don't know how this would generalise toPin, though (at the end of the day, we miss the PinMut abstraction).

RalfJung commented 5 years ago

@gnzlbg

The helper function should therefore be unsafe, and the calling code violates its pre-conditions, so the program has undefined behavior, and gets mis-optimized.

This phrasing heavily implied a general entailment of the form "precondition violated implies UB". Specifically, the "so" in your sentence indicates a causal connection. If you didn't mean to say that, take this as feedback for trying a clearer wording next time. Maybe I am nitpicking too much here, but I think it is important that we use clear language. The next time you write "A, so B" I need to know if this indicates a causal connection or not.

For @comex' example, it also doesn't really matter whether that function is marked unsafe or not as we were only considering its operational behavior.

When I read the thread, I got the impression from @comex comment that this might be a mis-optimization. I just wanted to point out that this is intended behavior.

Ah, the term "miscompilation" is the culprit here. Do we have a good word for "code that has UB that the compiler actually exploits", in contrast to "code that has UB but still works as intended by the programmer"? We often say "miscompilation" but that sounds like a compiler bug, which it isn't if the code really does have UB.

As far as I can tell, there was never any disagreement that that code is wrong according to our current aliasing rules, and so the compiler is in its right to "miscompile". @comex was not trying to argue the UB away. The point of that comment was that this code does the exact same thing as what you can do with owning-ref using only safe code. So either owning-ref is wrong or our aliasing rules are wrong.

Cc @Kimundi (owning-ref author); I also reported this upstream.

at the end of the day, we miss the PinMut abstraction

We might... :/

bluss commented 4 years ago

FWIW, we've attempted to address this in ndarray, where it is using a Vec for allocation/deallocation (partly historical purposes, now for zero-copy transfer of data between Array and Vec).

The description of the problem is this:

The owned array has used a Vec that it owns, and kept a separate "head pointer" ptr on the side; the head pointer points to the first element in the array, somewhere inside the vector's data.

and solution this:

We replace the Vec with a deconstructed vector; it's equivalent but uses NonNull for its pointer, so it's an aliasable owner.

It feels like a non-issue, because I can't find anywhere that we "interleave" read/writes through the Vec with reads/writes through our own "head pointer". The only places where we write to the Vec (for example in Clone::clone_from), afterwards we immediately refresh our head pointer, without using the old value. But it seems like the right thing to do, to avoid using a Vec with its Unique<T>.

RalfJung commented 4 years ago

It feels like a non-issue, because I can't find anywhere that we "interleave" read/writes through the Vec with reads/writes through our own "head pointer".

To clarify, you are saying this code should be fine and Stacked Borrows should not complain, for the reasons given?

bluss commented 4 years ago

If this code is the ndarray code in question: Yes, but I don't know the model that well, so it's just a hunch.

danielhenrymantilla commented 4 years ago

I still feel that if people / libraries (i.e., I am not talking about the compiler-generated unsugaring for generators and the Pin interaction) want to "alias" a Box's pointee, then they should downgrade the Box to a different type (which could be in std, that's a different topic) which would not include the Unique property: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=039a3ae5d241eeee37b0f8bc40446f6a

That is, a Box that only assumes Unique-ness when DerefMut (which in my example requires an explicit method call, as a "lint") and obviously when dropped.

Otherwise, its pointee can be aliased, and depending on the usage, either by a pointer that assumes immutability, so as long as the Box itself never mutates the pointee (should be enough for ::owning_ref), or one that could even support the Box mutating the pointee, so as long as it does not assume immutability (and is not used for mutation itself).


JakobDegen commented 1 year ago

We have separate issues to discuss the design of MaybeDangling and to discuss the open questions around Box aliasing requirements. Closing.

RalfJung commented 1 year ago

Links for those things: