Stacked Borrows: asserting uniqueness too early? Should we allow the optimizer to add spurious stores?

RalfJung commented 5 years ago

We have seen a few cases in libstd where the rule that just creating a mutable reference already asserts uniqueness is a problem:

VecDeque creating overlapping mutable references.
BTreeMap creating mutable references that overlap with shared references.
LinkedList creating overlapping mutable references.
Vec::push invalidating existing references into the vector. The typed-arena crate ran into the same problem.
shepmaster asking on Zulip why some code doesn't work.
We have a huge problem with self-referential generators, see https://github.com/rust-lang/unsafe-code-guidelines/issues/133.
https://github.com/bodil/sized-chunks/issues/8 would likely also not be an issue if we could delay uniqueness for long enough.
Some uses of ptr::replace are forbidden even though they should probably be allowed.

In some cases, even just creating a shared ref is an issue, as it conflicts with a recently created mutable ref:

So maybe we assert uniqueness to aggressively here? Maybe mutable references should only become unique on first write, or so? I am not sure what that would look like though. (In principle this could also happen with shared references but I have not seen that.)

I'll use this issue to collect such cases.

However, there are also cases where people want the extra UB to get more optimizations:

RalfJung commented 5 years ago

Another related case: https://github.com/rust-lang/rust/issues/62553; calling mem::forget asserts uniqueness and invalidates the pointer that we actually care about.

retep998 commented 5 years ago

This is my opinion based on experience with unsafe code and FFI:

While mutable references should assert that the data is otherwise immutable, I believe that the assertion of uniqueness should only apply to references. It is far too common of an assumption that raw pointers are not invalidated by a mutable reference existing, and that it is sufficient to merely hold off on reading or writing through raw pointers until the mutable reference ceases to exist. At the moment you dereference a raw pointer immutably no other mutable references should exist, and at the moment you dereference a raw pointer mutably no other references should exist at all, but a raw pointer should not be invalidated by mutable references.

RalfJung commented 5 years ago

the assertion of uniqueness should only apply to references

It does only apply to references.

if you have a reference and a raw pointer working on the same thing, the reference is not unique. It's not the raw pointer that is causing issues here, it is the reference.

a raw pointer should not be invalidated by mutable references.

It would help if you could give a concrete self-contained example of something that is UB according to current Stacked Borrows, but which you think should be allowed.

Is it something like this:

fn main() {
    let mut local = 0;
    let raw = &mut local as *mut i32;

    let mut_ref = &mut local;
    *mut_ref = 13;

    let _val = unsafe { *raw }; // UB (as of Stacked Borrows 2.1).
}

retep998 commented 5 years ago

fn main() {
    let mut x = 0;
    let y = &mut x as *mut _;
    { let _z = &mut x; }
    unsafe { *y = 10; }
}

When run through miri:

error[E0080]: Miri evaluation error: no item granting write access to tag <untagged> found in borrow stack
 --> src/main.rs:5:14
  |
5 |     unsafe { *y = 10; }
  |              ^^^^^^^ Miri evaluation error: no item granting write access to tag <untagged> found in borrow stack
  |
  = note: inside call to `main` at /root/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/rt.rs:64:34
  = note: inside call to closure at /root/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/rt.rs:52:53
  = note: inside call to closure at /root/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panicking.rs:296:40
  = note: inside call to `std::panicking::try::do_call::<[closure@DefId(1:5891 ~ std[82ff]::rt[0]::lang_start_internal[0]::{{closure}}[0]) 0:&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe], i32>` at /root/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panicking.rs:292:5
  = note: inside call to `std::panicking::try::<i32, [closure@DefId(1:5891 ~ std[82ff]::rt[0]::lang_start_internal[0]::{{closure}}[0]) 0:&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe]>` at /root/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panic.rs:394:9
  = note: inside call to `std::panic::catch_unwind::<[closure@DefId(1:5891 ~ std[82ff]::rt[0]::lang_start_internal[0]::{{closure}}[0]) 0:&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe], i32>` at /root/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/rt.rs:52:25
  = note: inside call to `std::rt::lang_start_internal` at /root/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/rt.rs:64:5
  = note: inside call to `std::rt::lang_start::<()>`

error: aborting due to previous error

I was led to believe that this error is by design due to stacked borrows. If it is just an issue with miri that will get fixed and this code example is actually fine then I would be so happy.

RalfJung commented 5 years ago

@retep998 Right, so that is an example involving an unused mutable references. This is definitely forbidden by the current Stacked Borrows (it is not "just" a Miri implementation issue), but I could imagine something like "Tree-shaped borrows" that would open a design space to allow more things here. However, it is unclear how much that would cost in terms of lost optimizations. This issue is about collecting cases where unused mutable references caused UB. ("used"/"unused" is a fuzzy term here, but if the variable is dead immediately after it got created, like in your last example, that's definitely "unused".)

What I am wondering about is how you feel about used mutable references interacting with raw pointers, like in my example above.

retep998 commented 5 years ago

In your example above the used mutable reference still exists when the raw pointer is accessed, which is shaky ground. If we were to modify my example to actually use the mutable reference...

fn main() {
    let mut x = 0;
    let y = &mut x as *mut _;
    {
        let z = &mut x;
        *z = 20;
    }
    unsafe { *y = 10; }
}

then I would still want it to be okay and not UB. Raw pointers should not care if something else mutated the value in the time between accesses.

taralx commented 5 years ago

So if I understand correctly, miri invalidates other children of x when &mut x is created. Is it possible to simply hold the raw children in abeyance, like one does with x itself? It is a bit surprising that creating z from y is ok, but creating z from x (which we know is the same as y) is not.

comex commented 5 years ago

@retep998 By itself your example is pretty harmless, but it's also useless in that nothing uses the stored value of 10. On the other hand, if we add a use of x...

fn main() {
    let mut x = 0;
    let y = &mut x as *mut _;
    {
        let z = &mut x;
        *z = 20;
    }
    unsafe { *y = 10; }
    println!("{}", x);
}

Then we have a problem, because what if the raw pointer stuff is hidden behind function calls?

fn foo(x: &mut i32){
    some_global = x as *mut _;
}

fn bar() {
    unsafe { *some_global = 10; }
}

fn main() {
    let mut x = 0;
    foo(&mut x);
    {
        let z = &mut x;
        *z = 20;
    }
    bar();
    println!("{}", x);
}

(Note that in reality foo and bar might be trait object methods or from external crates, so the compiler might not be able to inspect their contents.)

Under Stacked Borrows, the compiler can optimize main by changing println!("{}", x); to println!("{}", 20}. If this example is non-UB, it cannot.

In other words, from an implementation perspective, the issue is not how the compiler treats raw pointers, but what assumptions the compiler can make about unknown code.

That said, I agree that this is a footgun. I really wish we had something with miri's level of undefined behavior detection that also worked with FFI, at least to some extent.

gnzlbg commented 5 years ago

I really wish we had something with miri's level of undefined behavior detection that also worked with FFI, at least to some extent.

I don't think we should design things under the assumption that something like this can exist.

Lokathor commented 5 years ago

i think that means that the foo/bar example has to be blindly assumed to always be the sort of case if you pass an address to unknown code, because some code does work that way.

gnzlbg commented 5 years ago

In other words, from an implementation perspective, the issue is not how the compiler treats raw pointers, but what assumptions the compiler can make about unknown code.

While the aliasing model determines which optimizations Rust is allowed to do locally (without whole program analysis), it also determines what assumptions users can make about the behavior of unknown code. Can users reason about the code locally? E.g. if I modify this piece of code, I know I can do this and that, because these unknown functions are not allowed to do x and y without introducing UB. Or do users need to keep the whole program in their head to be able to reason about UB and about how to modify code without introducing UB ?

RalfJung commented 5 years ago

@retep998

In your example above the used mutable reference still exists when the raw pointer is accessed, which is shaky ground.

Now that is interesting. Stacked Borrows was specifically designed to not care about such cases of "existence" because I was sure that would result in a lot of backlash. ;) So I didn't expect some far on the "max freedom for unsafe code" end of the spectrum to consider this a reasonable restriction. However, there's also the fact that even the borrow checker considers a locally created reference dead when it is definitely not used/copied/reborrowed any more. I don't think the dynamic model should be even more conservative that the borrow checker.

OTOH, references passed as function argument do have special treatment that keeps them "live" throughout the call (and the borrow checker does assume they live for the entire function).

(Btw let me say that I enjoy this exchange a lot, this is the kind of discussion I was hoping to have when I put out my proposals.)

If we were to modify my example to actually use the mutable reference... [...] then I would still want it to be okay and not UB.

This is where it gets interesting. I consider this equivalent to my example as the only difference is in the scoping of some variables without drop glue.

Thankfully @comex already came up with a great example for why Stacked Borrows makes this example UB -- an example for what kind of optimizations this UB enables. In my view, the entire point of Stacked Borrows is to enable intraprocedural optimization based on alias information, so losing some of those is a big deal.

For the current version of Stacked Borrows, we have formal machine-checked proofs that some optimizations are possible (I'll share this in a blog post eventually). It does not seem obvious that even those optimizations are still possible in a model that permits as many programs as you are asking. But it might be possible. I will have to think about this more.

Raw pointers should not care if something else mutated the value in the time between accesses.

As stated above, the raw pointer does not care. It's the mutable reference that cares enough to make other pointers invalid, in order to enforce its own discipline.

@retep998 while I have your attention, can I probe your intuition another way? What if we change your example to use x directly instead of creating a new mutable reference and writing to that?

fn main() {
    let x = &mut 0; // changed x to also be a reference
    let y = &mut *x as *mut _;
    *x = 20; // instead of `{ let z = &mut *x; *z = 20; }`
    unsafe { *y = 10; }
    println!("{}", *x); // Does removing this change anything?
}

This program I feel very strongly should have UB, because between two uses of x (the write and the read) there is a conflicting access. If we do not make this UB, I think there is very little to nothing that we can optimize based on reference types.

But OTOH, my intuition also says it makes sense to consider *x = 20 and { let z = &mut *x; *z = 20; } as equivalent. At that point we are very close to your example, the only difference being the println! at the end.

So, I am wondering what your thoughts on this are.

@taralx

Is it possible to simply hold the raw children in abeyance, like one does with x itself?

Maybe. Stacked Borrows is but one point in a gigantic design space. But "simply" doesn't exactly describe the situation. ;)

It is a bit surprising that creating z from y is ok, but creating z from x (which we know is the same as y) is not.

However, here I think your intuition needs some refinement. x and y are not the same. The entire point of an aliasing model is to define which of multiple aliasing pointers you are actually allowed to use to do what in which order. That is, even if two pointers point to the same thing, they are not "the same".

This is somewhat unintuitive, but even C does this. Here is some literature: [1], [2], [3].

I don't think we should design things under the assumption that something like this can exist.

I think it could exist. It's just hard. Stacked Borrows supports untagged pointers specifically for this case: all accesses done by FFI code would be considered as happening with an untagged pointer.

@gnzlbg

E.g. if I modify this piece of code, I know I can do this and that, because these unknown functions are not allowed to do x and y without introducing UB.

I am afraid I don't understand your question. Do you have an example for "this and that"? Certainly the same local reasoning principles that the compiler uses, can also be used by the programmer.

gnzlbg commented 5 years ago

I am afraid I don't understand your question. Do you have an example for "this and that"? Certainly the same local reasoning principles that the compiler uses, can also be used by the programmer.

It wasn't a question, more like a criticism that the discussion often focus on which optimizations the aliasing model allows, which programs it bans, how it makes the life of miri/etc. easier, without actually covering how does the aliasing model make it easier for users to reason about their code. I think it should focus more on the latter.

For example, in @comex second example here, if bar cannot write to some_global, then a user modifying / refactoring main can just assume it does not happen, and can actually refactor the whole function without inspecting the code of foo and bar. However, if that is allowed to happen, then a user would need to inspect foo and bar to be able to touch main.

Arguing that making X UB allows users to not have to inspect the source code of your whole program when modifying some function is a stronger argument for most users IMO than saying that it allows some compiler optimization.

Lokathor commented 5 years ago

I agree. People are willing to set a lot of optimization on fire if it makes the universe easier to understand. See Also: all GC languages. The first evaluation of an aliasing model is if it lets a low to average skill programmer follow along with the implications of what they're doing. There are many other metrics too, but that needs to be the first target.

Which is not to say that Stacked Borrows, or any other model, is necessarily hard to follow, I just want to emphasize the importance of gnzlbg's point.

danielhenrymantilla commented 5 years ago

Minor nitpick that is nevertheless relevant to the "(human) reasoning about code" is that @comex's bar function ought to be unsafe: this makes it clearer that bar() may not be as innocent as it may look ~~from the outside~~ without knowing what it does, and that it should thus be moved with care.

But OTOH, my intuition also says it makes sense to consider *x = 20 and `{ let z = &mut x; z = 20; } as equivalent.

Another nitpick: it's x = 20, or *(&mut x) = 20 when being verbose

I think this equivalence is needed for reborrowing to make sense: one can always imagine that *(&mut x) = 20 goes through an ephemeral reborrow, something like "spurious reborrows" :D

I am on the side of a strict language with enabled optimizations, and an escape hatch for "raw stuff". In this case, Rust's escape hatch is UnsafeCell:

fn raw_mut_ref_from_unique<T : ?Sized> (x: &'_ mut T) -> &'_ UnsafeCell<T>
{
    unsafe {
        // # Safety:
        //
        //   - `UnsafeCell` is `repr(transparent)`
        //
        //   - data-race-free mutation is now the responsibility of the programmer
        //     (requires `unsafe`)
        mem::transmute(x)
    }
}

Contrived patterns such as @comex's would thus require starting from a & UnsafeCell reference instead of a &mut:

static SOME_GLOBAL: AtomicPtr<i32> = AtomicPtr::new(0 as *mut _);

fn foo (at_x: &'_ UnsafeCell<i32>)
{
    SOME_GLOBAL.store(at_x.get(), Ordering::SeqCst);
}

/// # Safety:
///
///   - `SOME_GLOBAL` must point to a valid i32,
///     which must not be read or written to in parallel
unsafe fn bar ()
{
     *SOME_GLOBAL.load(Ordering::SeqCst) = 10;
}

fn main ()
{
    let mut x = 0;
    {
        let at_x = raw_mut_ref_from_unique(&mut x);
        foo(at_x);
        {
            let z = at_x;
            unsafe {
                // # Safety: no parallel reads or writes
                *z.get() = 20;
            }
        }
        unsafe {
            // # Safety:
            //
            //   - the initial raw_mut_ref_from_unique has not been invalidated yet;
            //
            //   - no parallel reads or writes;
            bar();
        }
    } // SOME_GLOBAL is now invalid by the way
    println!("{}", x);
}

RalfJung commented 5 years ago

@gnzlbg I agree. A memory model has two consumers: programmers that code against it, and compiler authors that implement it and optimize under it.

Arguing that making X UB allows users to not have to inspect the source code of your whole program when modifying some function is a stronger argument for most users IMO than saying that it allows some compiler optimization.

Interesting. That is not at all my experience from most discussions around UB, and I have gotten explicit pushback against arguing by "simplicity", but maybe that's not what you mean.

The argument is almost the same though: allowing compiler optimizations is about making program analyses easier, and that also makes it easier for humans to analyze. Though I suppose something like TBAA is rarely useful for a human analyzer, while Stacked Borrows provides some entirely intraprocedural reasoning principles that humans can learn just like compilers can. If I had infinite time, I'd write a blog post or two about those reasoning principles, but in lieu of that I can only point at this section of the Stacked Borrows 1.0 post.

@Lokathor

People are willing to set a lot of optimization on fire if it makes the universe easier to understand.

Not the kind of people that make languages like C and C++ though. Source: have you looked at those languages?^^ And people use them because they are fast, which they are because of all those complications.

See Also: all GC languages

Not really, those languages also statically prohibit many of the things you have to worry about in C/C++. I mean, safe Rust also is a no-brainer when it comes to Stacked Borrows.

Ixrec commented 5 years ago

FWIW, everything going on in these Stacked Borrows models and all the UCG discussions is several lightyears ahead of C/C++ in terms of "simplicity" (which for me means roughly "comprehensibility to non-wizards"). I've long since given up having any idea what C/C++ think UB really is, but for the stuff under discussion here, I'm fairly sure that the only reason I can't work through most of the examples myself is simply that all of this stuff is still in flux so I haven't tried to properly memorize the current model. Certainly the basic idea in the last several comments that "while there's a &mut x in scope, using aliasing raw pointers to mutate x is UB" seems perfectly natural and intuitive to me (whether or not we end up accepting that rule).

I also feel like the past discussions on this have typically been taking simplicity / the ability for users to reason about their code very seriously, to the point that pretty much every Stacked Borrows update either gives a very specific reason why a new complexity had to be introduced, or proclaimed some significant simplification of the model, or both. Though perhaps I'm over-focusing on Ralf's blog posts there.

But maybe the point of bringing up simplicity/reasonability here was to imply something like "the complexity cost of introducing tree-shaped borrows is too high to be worth making retep's original case non-UB". In which case I tentatively agree; we'd need to see some big optimization wins to justify that.

taralx commented 5 years ago

FWIW I retract my suggestion on the basis of @comex's interprocedural example. :+1:

RalfJung commented 5 years ago

Another instance of "mutable references created but never used breaks stuff": https://github.com/SimonSapin/rust-typed-arena/pull/27.

goffrie commented 4 years ago

I ran into this (or something similar) today when using https://github.com/Matthias247/futures-intrusive under miri. In particular, we have some data structure

struct SomeFuture {
    wait_node: ListNode,
    // ...
}

where ListNode is an intrusive linked list node. We stash pointers to wait_node in some external structure. But then if we ever create a mutable reference to SomeFuture, those pointers are invalidated.

I was hoping that wrapping the ListNode in an UnsafeCell was enough, but it isn't - miri sees through the UnsafeCell.

It seems to me that this destroys any possibility of making "encapsulated" intrusive data structures, since we can't stop the user from taking mutable reference to the structure containing the list node - even if that mutable reference will never be used to mutate the node in question.

RalfJung commented 4 years ago

@goffrie I wonder if that situation isn't better covered by https://github.com/rust-lang/unsafe-code-guidelines/issues/148. Intrusive collections are not self-referential, but there is still a certain amount of deliberate aliasing, so it might make sense to make that type "opt out" of uniqueness guarantees.

I was hoping that wrapping the ListNode in an UnsafeCell was enough, but it isn't - miri sees through the UnsafeCell.

UnsafeCell has no effect whatsoever for mutable references, it only affects shared references.

RalfJung commented 4 years ago

https://github.com/rust-lang/rust/issues/73915 is another instance of this problem: there is more unfixed aliasing in btree because shared references created by the iterator implementation overlap with mutable references handed out to the client.

RalfJung commented 4 years ago

@ssomers with your work on https://github.com/rust-lang/rust/pull/73971, you had to deal a lot with Stacked Borrows' aliasing rules when mixing references and raw pointers. I would be very interested in your feedback for how well-suited you think these rules are for Rust, in particular for the issue discussed here -- that just creating a reference already makes some very strong aliasing assumptions.

ssomers commented 4 years ago

I do feel I had to deal a lot with the rules, because I didn't understand them. Now I think I do, but only in one situation (mutable iterators). I've tried to read this thread a few times and halfway I'm overwhelmed.

In my limited experience, relaxing the rule would merely hide fundamental problems. Then again, that is not always a bad thing to do, and worse the way we address those problems now is also to hide them by using raw pointers to stay under Miri's radar. So really, I have no idea what's best.

RalfJung commented 4 years ago

In my limited experience, relaxing the rule would merely hide fundamental problems. Then again, that is not always a bad thing to do, and worse the way we address those problems now is also to hide them by using raw pointers to stay under Miri's radar.

Well, using raw pointers is the intended solution for these problems. Raw pointers exist to be able to do stuff without worrying about aliasing.

The fact that Miri does not fully reliably check the intended rules around raw pointers is a problem though.

Lokathor commented 4 years ago

I'd say that raw pointers aren't hiding problems, they literally have less requirements, and so in some situations they actually do have less problems.

They don't fix everything of course, but sometimes they're the most appropriate solution.

ssomers commented 4 years ago

Raw pointers exist to be able to do stuff without worrying about aliasing.

I thought they existed to be able to worry yourself about aliasing by the underlying compiler and CPU, instead of leaving it up to Miri by constraining yourself to its model.

RalfJung commented 4 years ago

I don't understand that statement @ssomers. Miri does check aliasing of references with raw pointers, though currently not yet as aggressively as I'd like to.

It's the borrow checker that entirely ignores raw pointers.

RalfJung commented 2 years ago

Here's (another) example of some code that we might want to accept, but that Stacked Borrows currently rejects:

fn main() {
    let mut a = 0;
    let mut r = &mut a;
    unsafe { std::ptr::replace(&mut r as *mut _, r); }
}

See https://github.com/rust-lang/rust/issues/90055#issuecomment-946973699 for some more context.

saethlin commented 2 years ago

Here's an example of exceedingly subtle code that runs into this pointer invalidation pattern: https://github.com/rust-lang/rust/pull/92092

The underlying cause here is some very smelly code: using &mut to get a *mut to do a read through it. This isn't the first time I've seen excessively-mutable code and it's probably not good to turn those into time bombs.

saethlin commented 2 years ago

I've found a few examples of this pointer invalidation pattern which I think provide dubious if not provably zero optimization value. I've been pretty aggressively sending patches for code in the most-downloaded crates which trip up Miri, but I don't feel good about sending patches for these because it's not clear what the rules are for here.

slab implements a two-item get into an array, by calling get_mut_unchecked twice

pub unsafe fn get2_unchecked_mut(&mut self, key1: usize, key2: usize) -> (&mut T, &mut T) {
    let ptr1 = self.entries.get_unchecked_mut(key1) as *mut Entry<T>;
    let ptr2 = self.entries.get_unchecked_mut(key2) as *mut Entry<T>;
    match (&mut *ptr1, &mut *ptr2) {
        (&mut Entry::Occupied(ref mut val1), &mut Entry::Occupied(ref mut val2)) => {
            (val1, val2)
        }
        _ => unreachable!(),
    }
}

arrayvec does a ptr::copy but with the as_ptr + as_mut_ptr pattern: (1) (2) (3)

ptr::copy(
    self.v.as_ptr().add(self.processed_len),
    self.v.as_mut_ptr().add(self.processed_len - self.deleted_cnt),
    self.original_len - self.processed_len
);

let src = source_vec.as_ptr().add(tail);
let dst = source_vec.as_mut_ptr().add(start);
ptr::copy(src, dst, self.tail_len);

ptr::copy(self.as_ptr().add(next),
          self.as_mut_ptr().add(idx),
          len - next);

slightlyoutofphase commented 2 years ago

The underlying cause here is some very smelly code: using &mut to get a *mut to do a read through it. This isn't the first time I've seen excessively-mutable code and it's probably not good to turn those into time bombs.

That smallvec code is actually just lifted directly from the original alloc::vec::Vec::drain(), for what it's worth.

I don't really understand what the comment above the original line was trying to say, personally, also: there is no slice::IterMut involved there at any point. They just create an instance of &mut [T], and then call iter() through it, which creates a slice::Iter that the Drain then continuously calls ptr::read on the next() results of.

So whether or not that slice that exists only to have iter() immediately called through it is &mut [T] or &[T] does not actually matter at all at least as far as any impact on how the code functions overall in practice.

saethlin commented 2 years ago

I don't want to get too hung up on smallvec in particular. My general point was that there is code with unnecessary mutability in it, and currently, it works (mostly?). And if being sloppy with mutability is permitted today, it might be bad if we adopted Stacked Borrows 2.1 and made it UB.

That smallvec code is actually just lifted directly from the original alloc::vec::Vec::drain(), for what it's worth.

Exactly! Until yesterday, core and alloc had code in them that would fail stacked borrows with -Zmiri-tag-raw-pointers under those crates own test suites. "I got this from core" was not a well-founded argument for the code's validity, even in original context.

Additionally, under Stacked Borrows, smallvec is actually wrong to copy this code. smallvec's test suite doesn't pass Miri with -Zmiri-tag-raw-pointers. I came up with a diff that made the existing test suite pass but it smelled wrong to me. Turns out the crate's test coverage was actually incomplete, so I sent a PR to improve its test coverage. When I tried 18 days ago, I could not figure out how to make it pass Miri with raw pointer tagging.

So whether or not that slice that exists only to have iter() immediately called through it is &mut [T] or &[T] does not actually matter at all at least as far as any impact on how the code functions overall in practice.

The topic of discussion in this issue is whether creating such ephemeral mutable references should have the impact that Stacked Borrows currently says it does. Whether code creates a &mut [T] or &[T] matters quite a lot.

slightlyoutofphase commented 2 years ago

The topic of discussion in this issue is whether creating such ephemeral mutable references should have the impact that Stacked Borrows currently says it does. Whether code creates a &mut [T] or &[T] matters quite a lot.

Yeah, I don't disagree. What I meant was, I'm reasonably sure that in this particular case there's not actually a reason the code couldn't simply do this instead:

let range_slice = slice::from_raw_parts(self.as_ptr().add(start), end - start);

saethlin commented 2 years ago

Yup, that's my point about unnecessary mutability. The code you're proposing works just fine. Same deal as the linked rustc PR, the whole diff is just deleting mut.

saethlin commented 2 years ago

I am once again running Miri on a lot of crates. It seems that breaking this rule specifically with Box is rather common. A user on the community discord once pointed out that it really seems like Box is the tool to get address stability, but under SB 2.1 that feels like a bait-and-switch. You get address stability until you try to move the Box itself. Over time I'm coming to agree more and more with this perspective. I think I see the alternative perspective, that Box<T> owns the T so of course it gets Unique permission. But I think if that's the case, we should provide a way for people to get the tool they want: An RAII heap allocation that can be aliased. And provide strong lints to force people to migrate.

string_cache: https://github.com/servo/string-cache/blob/72d9737baef70e1c3afb684a77d09cbad2353065/src/dynamic_set.rs#L76-L83

        let mut entry = Box::new(Entry {
            next_in_bucket: self.buckets[bucket_index].take(),
            hash,
            ref_count: AtomicIsize::new(1),
            string: string.into_boxed_str(),
        });
        let ptr = NonNull::from(&mut *entry);
        self.buckets[bucket_index] = Some(entry);

lru: https://github.com/jeromefroe/lru-rs/blob/934f17ee0b744525176e5d2ba8cf5ce92eac97a3/src/lib.rs#L311-L340

                let node = /* snip */
                } else {
                    // if the cache is not full allocate a new LruEntry
                    Box::new(LruEntry::new(k, v))
                };

                let node_ptr: *mut LruEntry<K, V> = &mut *node;
                self.attach(node_ptr);

                let keyref = unsafe { (*node_ptr).key.as_ptr() };
                self.map.insert(KeyRef { k: keyref }, node);

owning_ref https://github.com/Kimundi/owning-ref-rs/blob/677c5bbe33210e65ababfd46eeb2b7aaab41f94d/src/lib.rs#L326-L333

    pub unsafe fn new_assert_stable_address(o: O) -> Self
        where O: Deref<Target = T>,
    {
        OwningRef {
            reference: &*o,
            owner: o,
        }
    }

saethlin commented 2 years ago

Also, it's unclear to me if the aforementioned code is already UB in practice due to the way rustc applies LLVM's noalias. If someone knows better than I, or has some kind of way to demonstrate this, or has some kind of checking interpreter for LLVM IR, that would be awesome.

danielhenrymantilla commented 2 years ago

Such a pointer is already featured in the ::aliasable crate; I think it currently "just" needs more publicity.

Related to it, and to owning-ref's UB, I wrote a pretty lengthy post to teach and raise awareness about these things: https://users.rust-lang.org/t/how-can-we-teach-people-about-self-referential-types/65362/9?u=yandros#warning-careful-with-unsafe-warning-1

comex commented 2 years ago

Also, it's unclear to me if the aforementioned code is already UB in practice due to the way rustc applies LLVM's noalias

Yes, it is. This code succeeds in debug mode but panics in release mode:

#[inline(never)]
fn noalias_test(mut a: Box<i32>, b: *mut i32) -> i32 {
    *a = 4;
    unsafe { *b = 5; }
    *a // <- LLVM assumes this must be 4
}

fn main() {
    let mut a = Box::new(10);
    let b = &mut *a as *mut i32;
    assert_eq!(noalias_test(a, b), 5);
}

I haven't heard of a checking interpreter for LLVM IR, but this kind of "write one thing, write another thing, read back the first thing" pattern is a good way to test aliasing assumptions in general.

saethlin commented 2 years ago

That's a slick demo, but I'm not totally sure it applies to the particular examples I linked. Particularly for a string interner or entries in an LRU cache, there shouldn't be any mutation. If there were mutation through the pointer I think that code might not apply to this issue in particular, because asserting uniqueness at mutation would just be UB through another route (I'd say later, but time travel).

I can't think up with any way that a surprising optimization would appear in the absence of mutation, but the LangRef's section on noalias only mentions access instead of specifically calling out mutation so I'm not sure of much here.

comex commented 2 years ago

The LangRef does call out mutation:

noalias This indicates that memory locations accessed via pointer values based on the argument or return value are not also accessed, during the execution of the function, via pointer values not based on the argument or return value. This guarantee only holds for memory locations that are modified, by any means, during the execution of the function. [..]

Emphasis added. Also, noalias was originally designed to implement C restrict, which has a similar rule:

if some object that is accessible through P (directly or indirectly) is modified, by any means, then all accesses to that object (both reads and writes) in that block must occur through P (directly or indirectly), otherwise the behavior is undefined

(source)

So if there really is no mutation, then you're safe from LLVM's perspective; perhaps that's an indication that Stacked Borrows should be loosened in that case.

That said:

I was able to translate my example to use owning_ref in combination with interior mutability:

use std::cell::Cell;
use owning_ref::BoxRef;
#[inline(never)]
fn noalias_test(x: BoxRef<Cell<i32>>) -> i32 {
    x.as_owner().set(4);
    x.set(5);
    x.as_owner().get()
}
fn main() {
    let x: BoxRef<Cell<i32>> = BoxRef::new(Box::new(Cell::new(10)));
    assert_eq!(noalias_test(x), 5);
}

(Incidentally, owning_ref unfortunately has multiple other soundness issues even without compiler optimizations... [1] [2])

As for lru, I believe there is mutation involved. The Box points to an LruEntry<K, V> which contains prev and next pointers, forming a doubly linked list; these pointers will be mutated when adjacent entries are added or removed. It's also possible that the key or value could have interior mutability. Similarly, in string-cache, the next_in_bucket field is used to form a singly linked list, and an entry's next_in_bucket field will be mutated if the entry following it is removed.

That said, I suspect it would be difficult to actually trigger a miscompilation in either of those two crates.

saethlin commented 2 years ago

Big thanks for correcting me so thoroughly! I've run into a subtle case with interior mutability before, maybe sprinkling Cell around should just be part of looking into these findings.

JakobDegen commented 1 year ago

Visiting for UCG. Issue is mostly fine, but editing some Box-related questions out of the main issue in an attempt to keep things managable

RalfJung commented 6 months ago

FWIW, Tree Borrows does things different here -- mutable references are two-phase, and uniqueness is only asserted when they are written to. That seems to be a fairly elegant solution, though it also has its downsides.

RalfJung commented 5 months ago

And of course now people are asking for optimizations that are only correct if we do aggressively assert uniqueness early. ;)

https://github.com/rust-lang/rust/issues/124346

rust-lang / unsafe-code-guidelines

Stacked Borrows: asserting uniqueness too early? Should we allow the optimizer to add spurious stores? #133