Open RalfJung opened 5 years ago
Another related case: https://github.com/rust-lang/rust/issues/62553; calling mem::forget
asserts uniqueness and invalidates the pointer that we actually care about.
This is my opinion based on experience with unsafe code and FFI:
While mutable references should assert that the data is otherwise immutable, I believe that the assertion of uniqueness should only apply to references. It is far too common of an assumption that raw pointers are not invalidated by a mutable reference existing, and that it is sufficient to merely hold off on reading or writing through raw pointers until the mutable reference ceases to exist. At the moment you dereference a raw pointer immutably no other mutable references should exist, and at the moment you dereference a raw pointer mutably no other references should exist at all, but a raw pointer should not be invalidated by mutable references.
the assertion of uniqueness should only apply to references
It does only apply to references.
if you have a reference and a raw pointer working on the same thing, the reference is not unique. It's not the raw pointer that is causing issues here, it is the reference.
a raw pointer should not be invalidated by mutable references.
It would help if you could give a concrete self-contained example of something that is UB according to current Stacked Borrows, but which you think should be allowed.
Is it something like this:
fn main() {
let mut local = 0;
let raw = &mut local as *mut i32;
let mut_ref = &mut local;
*mut_ref = 13;
let _val = unsafe { *raw }; // UB (as of Stacked Borrows 2.1).
}
fn main() {
let mut x = 0;
let y = &mut x as *mut _;
{ let _z = &mut x; }
unsafe { *y = 10; }
}
When run through miri:
error[E0080]: Miri evaluation error: no item granting write access to tag <untagged> found in borrow stack
--> src/main.rs:5:14
|
5 | unsafe { *y = 10; }
| ^^^^^^^ Miri evaluation error: no item granting write access to tag <untagged> found in borrow stack
|
= note: inside call to `main` at /root/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/rt.rs:64:34
= note: inside call to closure at /root/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/rt.rs:52:53
= note: inside call to closure at /root/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panicking.rs:296:40
= note: inside call to `std::panicking::try::do_call::<[closure@DefId(1:5891 ~ std[82ff]::rt[0]::lang_start_internal[0]::{{closure}}[0]) 0:&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe], i32>` at /root/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panicking.rs:292:5
= note: inside call to `std::panicking::try::<i32, [closure@DefId(1:5891 ~ std[82ff]::rt[0]::lang_start_internal[0]::{{closure}}[0]) 0:&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe]>` at /root/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panic.rs:394:9
= note: inside call to `std::panic::catch_unwind::<[closure@DefId(1:5891 ~ std[82ff]::rt[0]::lang_start_internal[0]::{{closure}}[0]) 0:&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe], i32>` at /root/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/rt.rs:52:25
= note: inside call to `std::rt::lang_start_internal` at /root/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/rt.rs:64:5
= note: inside call to `std::rt::lang_start::<()>`
error: aborting due to previous error
I was led to believe that this error is by design due to stacked borrows. If it is just an issue with miri that will get fixed and this code example is actually fine then I would be so happy.
@retep998 Right, so that is an example involving an unused mutable references. This is definitely forbidden by the current Stacked Borrows (it is not "just" a Miri implementation issue), but I could imagine something like "Tree-shaped borrows" that would open a design space to allow more things here. However, it is unclear how much that would cost in terms of lost optimizations. This issue is about collecting cases where unused mutable references caused UB. ("used"/"unused" is a fuzzy term here, but if the variable is dead immediately after it got created, like in your last example, that's definitely "unused".)
What I am wondering about is how you feel about used mutable references interacting with raw pointers, like in my example above.
In your example above the used mutable reference still exists when the raw pointer is accessed, which is shaky ground. If we were to modify my example to actually use the mutable reference...
fn main() {
let mut x = 0;
let y = &mut x as *mut _;
{
let z = &mut x;
*z = 20;
}
unsafe { *y = 10; }
}
then I would still want it to be okay and not UB. Raw pointers should not care if something else mutated the value in the time between accesses.
So if I understand correctly, miri invalidates other children of x when &mut x is created. Is it possible to simply hold the raw children in abeyance, like one does with x itself? It is a bit surprising that creating z from y is ok, but creating z from x (which we know is the same as y) is not.
@retep998 By itself your example is pretty harmless, but it's also useless in that nothing uses the stored value of 10
. On the other hand, if we add a use of x...
fn main() {
let mut x = 0;
let y = &mut x as *mut _;
{
let z = &mut x;
*z = 20;
}
unsafe { *y = 10; }
println!("{}", x);
}
Then we have a problem, because what if the raw pointer stuff is hidden behind function calls?
fn foo(x: &mut i32){
some_global = x as *mut _;
}
fn bar() {
unsafe { *some_global = 10; }
}
fn main() {
let mut x = 0;
foo(&mut x);
{
let z = &mut x;
*z = 20;
}
bar();
println!("{}", x);
}
(Note that in reality foo
and bar
might be trait object methods or from external crates, so the compiler might not be able to inspect their contents.)
Under Stacked Borrows, the compiler can optimize main
by changing println!("{}", x);
to println!("{}", 20}
. If this example is non-UB, it cannot.
In other words, from an implementation perspective, the issue is not how the compiler treats raw pointers, but what assumptions the compiler can make about unknown code.
That said, I agree that this is a footgun. I really wish we had something with miri's level of undefined behavior detection that also worked with FFI, at least to some extent.
I really wish we had something with miri's level of undefined behavior detection that also worked with FFI, at least to some extent.
I don't think we should design things under the assumption that something like this can exist.
i think that means that the foo/bar example has to be blindly assumed to always be the sort of case if you pass an address to unknown code, because some code does work that way.
In other words, from an implementation perspective, the issue is not how the compiler treats raw pointers, but what assumptions the compiler can make about unknown code.
While the aliasing model determines which optimizations Rust is allowed to do locally (without whole program analysis), it also determines what assumptions users can make about the behavior of unknown code. Can users reason about the code locally? E.g. if I modify this piece of code, I know I can do this and that, because these unknown functions are not allowed to do x and y without introducing UB. Or do users need to keep the whole program in their head to be able to reason about UB and about how to modify code without introducing UB ?
@retep998
In your example above the used mutable reference still exists when the raw pointer is accessed, which is shaky ground.
Now that is interesting. Stacked Borrows was specifically designed to not care about such cases of "existence" because I was sure that would result in a lot of backlash. ;) So I didn't expect some far on the "max freedom for unsafe code" end of the spectrum to consider this a reasonable restriction. However, there's also the fact that even the borrow checker considers a locally created reference dead when it is definitely not used/copied/reborrowed any more. I don't think the dynamic model should be even more conservative that the borrow checker.
OTOH, references passed as function argument do have special treatment that keeps them "live" throughout the call (and the borrow checker does assume they live for the entire function).
(Btw let me say that I enjoy this exchange a lot, this is the kind of discussion I was hoping to have when I put out my proposals.)
If we were to modify my example to actually use the mutable reference... [...] then I would still want it to be okay and not UB.
This is where it gets interesting. I consider this equivalent to my example as the only difference is in the scoping of some variables without drop glue.
Thankfully @comex already came up with a great example for why Stacked Borrows makes this example UB -- an example for what kind of optimizations this UB enables. In my view, the entire point of Stacked Borrows is to enable intraprocedural optimization based on alias information, so losing some of those is a big deal.
For the current version of Stacked Borrows, we have formal machine-checked proofs that some optimizations are possible (I'll share this in a blog post eventually). It does not seem obvious that even those optimizations are still possible in a model that permits as many programs as you are asking. But it might be possible. I will have to think about this more.
Raw pointers should not care if something else mutated the value in the time between accesses.
As stated above, the raw pointer does not care. It's the mutable reference that cares enough to make other pointers invalid, in order to enforce its own discipline.
@retep998 while I have your attention, can I probe your intuition another way? What if we change your example to use x
directly instead of creating a new mutable reference and writing to that?
fn main() {
let x = &mut 0; // changed x to also be a reference
let y = &mut *x as *mut _;
*x = 20; // instead of `{ let z = &mut *x; *z = 20; }`
unsafe { *y = 10; }
println!("{}", *x); // Does removing this change anything?
}
This program I feel very strongly should have UB, because between two uses of x
(the write and the read) there is a conflicting access. If we do not make this UB, I think there is very little to nothing that we can optimize based on reference types.
But OTOH, my intuition also says it makes sense to consider *x = 20
and { let z = &mut *x; *z = 20; }
as equivalent. At that point we are very close to your example, the only difference being the println!
at the end.
So, I am wondering what your thoughts on this are.
@taralx
Is it possible to simply hold the raw children in abeyance, like one does with x itself?
Maybe. Stacked Borrows is but one point in a gigantic design space. But "simply" doesn't exactly describe the situation. ;)
It is a bit surprising that creating z from y is ok, but creating z from x (which we know is the same as y) is not.
However, here I think your intuition needs some refinement. x
and y
are not the same. The entire point of an aliasing model is to define which of multiple aliasing pointers you are actually allowed to use to do what in which order. That is, even if two pointers point to the same thing, they are not "the same".
This is somewhat unintuitive, but even C does this. Here is some literature: [1], [2], [3].
I don't think we should design things under the assumption that something like this can exist.
I think it could exist. It's just hard. Stacked Borrows supports untagged pointers specifically for this case: all accesses done by FFI code would be considered as happening with an untagged pointer.
@gnzlbg
E.g. if I modify this piece of code, I know I can do this and that, because these unknown functions are not allowed to do x and y without introducing UB.
I am afraid I don't understand your question. Do you have an example for "this and that"? Certainly the same local reasoning principles that the compiler uses, can also be used by the programmer.
I am afraid I don't understand your question. Do you have an example for "this and that"? Certainly the same local reasoning principles that the compiler uses, can also be used by the programmer.
It wasn't a question, more like a criticism that the discussion often focus on which optimizations the aliasing model allows, which programs it bans, how it makes the life of miri/etc. easier, without actually covering how does the aliasing model make it easier for users to reason about their code. I think it should focus more on the latter.
For example, in @comex second example here, if bar
cannot write to some_global
, then a user modifying / refactoring main can just assume it does not happen, and can actually refactor the whole function without inspecting the code of foo
and bar
. However, if that is allowed to happen, then a user would need to inspect foo
and bar
to be able to touch main
.
Arguing that making X UB allows users to not have to inspect the source code of your whole program when modifying some function is a stronger argument for most users IMO than saying that it allows some compiler optimization.
I agree. People are willing to set a lot of optimization on fire if it makes the universe easier to understand. See Also: all GC languages. The first evaluation of an aliasing model is if it lets a low to average skill programmer follow along with the implications of what they're doing. There are many other metrics too, but that needs to be the first target.
Which is not to say that Stacked Borrows, or any other model, is necessarily hard to follow, I just want to emphasize the importance of gnzlbg's point.
Minor nitpick that is nevertheless relevant to the "(human) reasoning about code" is that @comex's bar
function ought to be unsafe
: this makes it clearer that bar()
may not be as innocent as it may look from the outside without knowing what it does, and that it should thus be moved with care.
But OTOH, my intuition also says it makes sense to consider
*x = 20
and `{ let z = &mut x; z = 20; } as equivalent.
x = 20
, or *(&mut x) = 20
when being verboseI think this equivalence is needed for reborrowing to make sense: one can always imagine that *(&mut x) = 20
goes through an ephemeral reborrow, something like "spurious reborrows" :D
I am on the side of a strict language with enabled optimizations, and an escape hatch for "raw stuff". In this case, Rust's escape hatch is UnsafeCell
:
fn raw_mut_ref_from_unique<T : ?Sized> (x: &'_ mut T) -> &'_ UnsafeCell<T>
{
unsafe {
// # Safety:
//
// - `UnsafeCell` is `repr(transparent)`
//
// - data-race-free mutation is now the responsibility of the programmer
// (requires `unsafe`)
mem::transmute(x)
}
}
Contrived patterns such as @comex's would thus require starting from a & UnsafeCell
reference instead of a &mut
:
static SOME_GLOBAL: AtomicPtr<i32> = AtomicPtr::new(0 as *mut _);
fn foo (at_x: &'_ UnsafeCell<i32>)
{
SOME_GLOBAL.store(at_x.get(), Ordering::SeqCst);
}
/// # Safety:
///
/// - `SOME_GLOBAL` must point to a valid i32,
/// which must not be read or written to in parallel
unsafe fn bar ()
{
*SOME_GLOBAL.load(Ordering::SeqCst) = 10;
}
fn main ()
{
let mut x = 0;
{
let at_x = raw_mut_ref_from_unique(&mut x);
foo(at_x);
{
let z = at_x;
unsafe {
// # Safety: no parallel reads or writes
*z.get() = 20;
}
}
unsafe {
// # Safety:
//
// - the initial raw_mut_ref_from_unique has not been invalidated yet;
//
// - no parallel reads or writes;
bar();
}
} // SOME_GLOBAL is now invalid by the way
println!("{}", x);
}
@gnzlbg I agree. A memory model has two consumers: programmers that code against it, and compiler authors that implement it and optimize under it.
Arguing that making X UB allows users to not have to inspect the source code of your whole program when modifying some function is a stronger argument for most users IMO than saying that it allows some compiler optimization.
Interesting. That is not at all my experience from most discussions around UB, and I have gotten explicit pushback against arguing by "simplicity", but maybe that's not what you mean.
The argument is almost the same though: allowing compiler optimizations is about making program analyses easier, and that also makes it easier for humans to analyze. Though I suppose something like TBAA is rarely useful for a human analyzer, while Stacked Borrows provides some entirely intraprocedural reasoning principles that humans can learn just like compilers can. If I had infinite time, I'd write a blog post or two about those reasoning principles, but in lieu of that I can only point at this section of the Stacked Borrows 1.0 post.
@Lokathor
People are willing to set a lot of optimization on fire if it makes the universe easier to understand.
Not the kind of people that make languages like C and C++ though. Source: have you looked at those languages?^^ And people use them because they are fast, which they are because of all those complications.
See Also: all GC languages
Not really, those languages also statically prohibit many of the things you have to worry about in C/C++. I mean, safe Rust also is a no-brainer when it comes to Stacked Borrows.
FWIW, everything going on in these Stacked Borrows models and all the UCG discussions is several lightyears ahead of C/C++ in terms of "simplicity" (which for me means roughly "comprehensibility to non-wizards"). I've long since given up having any idea what C/C++ think UB really is, but for the stuff under discussion here, I'm fairly sure that the only reason I can't work through most of the examples myself is simply that all of this stuff is still in flux so I haven't tried to properly memorize the current model. Certainly the basic idea in the last several comments that "while there's a &mut x
in scope, using aliasing raw pointers to mutate x is UB" seems perfectly natural and intuitive to me (whether or not we end up accepting that rule).
I also feel like the past discussions on this have typically been taking simplicity / the ability for users to reason about their code very seriously, to the point that pretty much every Stacked Borrows update either gives a very specific reason why a new complexity had to be introduced, or proclaimed some significant simplification of the model, or both. Though perhaps I'm over-focusing on Ralf's blog posts there.
But maybe the point of bringing up simplicity/reasonability here was to imply something like "the complexity cost of introducing tree-shaped borrows is too high to be worth making retep's original case non-UB". In which case I tentatively agree; we'd need to see some big optimization wins to justify that.
FWIW I retract my suggestion on the basis of @comex's interprocedural example. :+1:
Another instance of "mutable references created but never used breaks stuff": https://github.com/SimonSapin/rust-typed-arena/pull/27.
I ran into this (or something similar) today when using https://github.com/Matthias247/futures-intrusive under miri. In particular, we have some data structure
struct SomeFuture {
wait_node: ListNode,
// ...
}
where ListNode
is an intrusive linked list node. We stash pointers to wait_node
in some external structure. But then if we ever create a mutable reference to SomeFuture
, those pointers are invalidated.
I was hoping that wrapping the ListNode in an UnsafeCell was enough, but it isn't - miri sees through the UnsafeCell.
It seems to me that this destroys any possibility of making "encapsulated" intrusive data structures, since we can't stop the user from taking mutable reference to the structure containing the list node - even if that mutable reference will never be used to mutate the node in question.
@goffrie I wonder if that situation isn't better covered by https://github.com/rust-lang/unsafe-code-guidelines/issues/148. Intrusive collections are not self-referential, but there is still a certain amount of deliberate aliasing, so it might make sense to make that type "opt out" of uniqueness guarantees.
I was hoping that wrapping the ListNode in an UnsafeCell was enough, but it isn't - miri sees through the UnsafeCell.
UnsafeCell has no effect whatsoever for mutable references, it only affects shared references.
https://github.com/rust-lang/rust/issues/73915 is another instance of this problem: there is more unfixed aliasing in btree
because shared references created by the iterator implementation overlap with mutable references handed out to the client.
@ssomers with your work on https://github.com/rust-lang/rust/pull/73971, you had to deal a lot with Stacked Borrows' aliasing rules when mixing references and raw pointers. I would be very interested in your feedback for how well-suited you think these rules are for Rust, in particular for the issue discussed here -- that just creating a reference already makes some very strong aliasing assumptions.
I do feel I had to deal a lot with the rules, because I didn't understand them. Now I think I do, but only in one situation (mutable iterators). I've tried to read this thread a few times and halfway I'm overwhelmed.
In my limited experience, relaxing the rule would merely hide fundamental problems. Then again, that is not always a bad thing to do, and worse the way we address those problems now is also to hide them by using raw pointers to stay under Miri's radar. So really, I have no idea what's best.
In my limited experience, relaxing the rule would merely hide fundamental problems. Then again, that is not always a bad thing to do, and worse the way we address those problems now is also to hide them by using raw pointers to stay under Miri's radar.
Well, using raw pointers is the intended solution for these problems. Raw pointers exist to be able to do stuff without worrying about aliasing.
The fact that Miri does not fully reliably check the intended rules around raw pointers is a problem though.
I'd say that raw pointers aren't hiding problems, they literally have less requirements, and so in some situations they actually do have less problems.
They don't fix everything of course, but sometimes they're the most appropriate solution.
Raw pointers exist to be able to do stuff without worrying about aliasing.
I thought they existed to be able to worry yourself about aliasing by the underlying compiler and CPU, instead of leaving it up to Miri by constraining yourself to its model.
I don't understand that statement @ssomers. Miri does check aliasing of references with raw pointers, though currently not yet as aggressively as I'd like to.
It's the borrow checker that entirely ignores raw pointers.
Here's (another) example of some code that we might want to accept, but that Stacked Borrows currently rejects:
fn main() {
let mut a = 0;
let mut r = &mut a;
unsafe { std::ptr::replace(&mut r as *mut _, r); }
}
See https://github.com/rust-lang/rust/issues/90055#issuecomment-946973699 for some more context.
Here's an example of exceedingly subtle code that runs into this pointer invalidation pattern: https://github.com/rust-lang/rust/pull/92092
The underlying cause here is some very smelly code: using &mut to get a *mut to do a read through it. This isn't the first time I've seen excessively-mutable code and it's probably not good to turn those into time bombs.
I've found a few examples of this pointer invalidation pattern which I think provide dubious if not provably zero optimization value. I've been pretty aggressively sending patches for code in the most-downloaded crates which trip up Miri, but I don't feel good about sending patches for these because it's not clear what the rules are for here.
slab
implements a two-item get
into an array, by calling get_mut_unchecked
twice
pub unsafe fn get2_unchecked_mut(&mut self, key1: usize, key2: usize) -> (&mut T, &mut T) {
let ptr1 = self.entries.get_unchecked_mut(key1) as *mut Entry<T>;
let ptr2 = self.entries.get_unchecked_mut(key2) as *mut Entry<T>;
match (&mut *ptr1, &mut *ptr2) {
(&mut Entry::Occupied(ref mut val1), &mut Entry::Occupied(ref mut val2)) => {
(val1, val2)
}
_ => unreachable!(),
}
}
arrayvec
does a ptr::copy
but with the as_ptr
+ as_mut_ptr
pattern: (1) (2) (3)
ptr::copy(
self.v.as_ptr().add(self.processed_len),
self.v.as_mut_ptr().add(self.processed_len - self.deleted_cnt),
self.original_len - self.processed_len
);
let src = source_vec.as_ptr().add(tail);
let dst = source_vec.as_mut_ptr().add(start);
ptr::copy(src, dst, self.tail_len);
ptr::copy(self.as_ptr().add(next),
self.as_mut_ptr().add(idx),
len - next);
The underlying cause here is some very smelly code: using &mut to get a *mut to do a read through it. This isn't the first time I've seen excessively-mutable code and it's probably not good to turn those into time bombs.
That smallvec code is actually just lifted directly from the original alloc::vec::Vec::drain(), for what it's worth.
I don't really understand what the comment above the original line was trying to say, personally, also: there is no slice::IterMut
involved there at any point. They just create an instance of &mut [T]
, and then call iter()
through it, which creates a slice::Iter
that the Drain
then continuously calls ptr::read
on the next()
results of.
So whether or not that slice that exists only to have iter()
immediately called through it is &mut [T]
or &[T]
does not actually matter at all at least as far as any impact on how the code functions overall in practice.
I don't want to get too hung up on smallvec
in particular. My general point was that there is code with unnecessary mutability in it, and currently, it works (mostly?). And if being sloppy with mutability is permitted today, it might be bad if we adopted Stacked Borrows 2.1 and made it UB.
That smallvec code is actually just lifted directly from the original alloc::vec::Vec::drain(), for what it's worth.
Exactly! Until yesterday, core
and alloc
had code in them that would fail stacked borrows with -Zmiri-tag-raw-pointers
under those crates own test suites. "I got this from core
" was not a well-founded argument for the code's validity, even in original context.
Additionally, under Stacked Borrows, smallvec
is actually wrong to copy this code. smallvec
's test suite doesn't pass Miri with -Zmiri-tag-raw-pointers
. I came up with a diff that made the existing test suite pass but it smelled wrong to me. Turns out the crate's test coverage was actually incomplete, so I sent a PR to improve its test coverage. When I tried 18 days ago, I could not figure out how to make it pass Miri with raw pointer tagging.
So whether or not that slice that exists only to have
iter()
immediately called through it is&mut [T]
or&[T]
does not actually matter at all at least as far as any impact on how the code functions overall in practice.
The topic of discussion in this issue is whether creating such ephemeral mutable references should have the impact that Stacked Borrows currently says it does. Whether code creates a &mut [T]
or &[T]
matters quite a lot.
The topic of discussion in this issue is whether creating such ephemeral mutable references should have the impact that Stacked Borrows currently says it does. Whether code creates a
&mut [T]
or&[T]
matters quite a lot.
Yeah, I don't disagree. What I meant was, I'm reasonably sure that in this particular case there's not actually a reason the code couldn't simply do this instead:
let range_slice = slice::from_raw_parts(self.as_ptr().add(start), end - start);
Yup, that's my point about unnecessary mutability. The code you're proposing works just fine. Same deal as the linked rustc PR, the whole diff is just deleting mut
.
I am once again running Miri on a lot of crates. It seems that breaking this rule specifically with Box
is rather common. A user on the community discord once pointed out that it really seems like Box
is the tool to get address stability, but under SB 2.1 that feels like a bait-and-switch. You get address stability until you try to move the Box
itself. Over time I'm coming to agree more and more with this perspective. I think I see the alternative perspective, that Box<T>
owns the T
so of course it gets Unique
permission. But I think if that's the case, we should provide a way for people to get the tool they want: An RAII heap allocation that can be aliased. And provide strong lints to force people to migrate.
string_cache
: https://github.com/servo/string-cache/blob/72d9737baef70e1c3afb684a77d09cbad2353065/src/dynamic_set.rs#L76-L83
let mut entry = Box::new(Entry {
next_in_bucket: self.buckets[bucket_index].take(),
hash,
ref_count: AtomicIsize::new(1),
string: string.into_boxed_str(),
});
let ptr = NonNull::from(&mut *entry);
self.buckets[bucket_index] = Some(entry);
let node = /* snip */
} else {
// if the cache is not full allocate a new LruEntry
Box::new(LruEntry::new(k, v))
};
let node_ptr: *mut LruEntry<K, V> = &mut *node;
self.attach(node_ptr);
let keyref = unsafe { (*node_ptr).key.as_ptr() };
self.map.insert(KeyRef { k: keyref }, node);
pub unsafe fn new_assert_stable_address(o: O) -> Self
where O: Deref<Target = T>,
{
OwningRef {
reference: &*o,
owner: o,
}
}
Also, it's unclear to me if the aforementioned code is already UB in practice due to the way rustc applies LLVM's noalias
. If someone knows better than I, or has some kind of way to demonstrate this, or has some kind of checking interpreter for LLVM IR, that would be awesome.
Such a pointer is already featured in the ::aliasable
crate; I think it currently "just" needs more publicity.
Related to it, and to owning-ref
's UB, I wrote a pretty lengthy post to teach and raise awareness about these things: https://users.rust-lang.org/t/how-can-we-teach-people-about-self-referential-types/65362/9?u=yandros#warning-careful-with-unsafe-warning-1
Also, it's unclear to me if the aforementioned code is already UB in practice due to the way rustc applies LLVM's
noalias
Yes, it is. This code succeeds in debug mode but panics in release mode:
#[inline(never)]
fn noalias_test(mut a: Box<i32>, b: *mut i32) -> i32 {
*a = 4;
unsafe { *b = 5; }
*a // <- LLVM assumes this must be 4
}
fn main() {
let mut a = Box::new(10);
let b = &mut *a as *mut i32;
assert_eq!(noalias_test(a, b), 5);
}
I haven't heard of a checking interpreter for LLVM IR, but this kind of "write one thing, write another thing, read back the first thing" pattern is a good way to test aliasing assumptions in general.
That's a slick demo, but I'm not totally sure it applies to the particular examples I linked. Particularly for a string interner or entries in an LRU cache, there shouldn't be any mutation. If there were mutation through the pointer I think that code might not apply to this issue in particular, because asserting uniqueness at mutation would just be UB through another route (I'd say later, but time travel).
I can't think up with any way that a surprising optimization would appear in the absence of mutation, but the LangRef's section on noalias
only mentions access instead of specifically calling out mutation so I'm not sure of much here.
The LangRef does call out mutation:
noalias
This indicates that memory locations accessed via pointer values based on the argument or return value are not also accessed, during the execution of the function, via pointer values not based on the argument or return value. This guarantee only holds for memory locations that are modified, by any means, during the execution of the function. [..]
Emphasis added. Also, noalias
was originally designed to implement C restrict
, which has a similar rule:
if some object that is accessible through P (directly or indirectly) is modified, by any means, then all accesses to that object (both reads and writes) in that block must occur through P (directly or indirectly), otherwise the behavior is undefined
So if there really is no mutation, then you're safe from LLVM's perspective; perhaps that's an indication that Stacked Borrows should be loosened in that case.
That said:
I was able to translate my example to use owning_ref
in combination with interior mutability:
use std::cell::Cell;
use owning_ref::BoxRef;
#[inline(never)]
fn noalias_test(x: BoxRef<Cell<i32>>) -> i32 {
x.as_owner().set(4);
x.set(5);
x.as_owner().get()
}
fn main() {
let x: BoxRef<Cell<i32>> = BoxRef::new(Box::new(Cell::new(10)));
assert_eq!(noalias_test(x), 5);
}
(Incidentally, owning_ref
unfortunately has multiple other soundness issues even without compiler optimizations... [1] [2])
As for lru
, I believe there is mutation involved. The Box
points to an LruEntry<K, V>
which contains prev
and next
pointers, forming a doubly linked list; these pointers will be mutated when adjacent entries are added or removed. It's also possible that the key or value could have interior mutability. Similarly, in string-cache
, the next_in_bucket
field is used to form a singly linked list, and an entry's next_in_bucket
field will be mutated if the entry following it is removed.
That said, I suspect it would be difficult to actually trigger a miscompilation in either of those two crates.
Big thanks for correcting me so thoroughly! I've run into a subtle case with interior mutability before, maybe sprinkling Cell around should just be part of looking into these findings.
Visiting for UCG. Issue is mostly fine, but editing some Box
-related questions out of the main issue in an attempt to keep things managable
FWIW, Tree Borrows does things different here -- mutable references are two-phase, and uniqueness is only asserted when they are written to. That seems to be a fairly elegant solution, though it also has its downsides.
And of course now people are asking for optimizations that are only correct if we do aggressively assert uniqueness early. ;)
We have seen a few cases in libstd where the rule that just creating a mutable reference already asserts uniqueness is a problem:
VecDeque
creating overlapping mutable references.BTreeMap
creating mutable references that overlap with shared references.LinkedList
creating overlapping mutable references.Vec::push
invalidating existing references into the vector. The typed-arena crate ran into the same problem.ptr::replace
are forbidden even though they should probably be allowed.In some cases, even just creating a shared ref is an issue, as it conflicts with a recently created mutable ref:
So maybe we assert uniqueness to aggressively here? Maybe mutable references should only become unique on first write, or so? I am not sure what that would look like though. (In principle this could also happen with shared references but I have not seen that.)
I'll use this issue to collect such cases.
However, there are also cases where people want the extra UB to get more optimizations: