Should `&mut`-derived pointers be permanently "separate" from their siblings?

rust-lang / unsafe-code-guidelines

Forum for discussion about what unsafe code can and can't do

https://rust-lang.github.io/unsafe-code-guidelines

Apache License 2.0

666 stars 58 forks source link

Should `&mut`-derived pointers be permanently "separate" from their siblings? #450

Open RalfJung opened 1 year ago

RalfJung commented 1 year ago

The following code is fine according to LLVM noalias, but rejected by both Stacked Borrows and Tree Borrows:

fn main() { unsafe {
    let mut array = [0i32; 8];
    let ptr1 = array.as_mut_ptr();
    let ptr2 = array.as_mut_ptr();
    ptr1.write(0);
    ptr2.write(0);
} }

The reason (in Tree Borrows terms) is that when ptr2 is created, it is considered as a "sibling" pointer to ptr1, and even though their noalias scope is over, the model still remembers that these pointers are not identical and actions on one can disable the other. Put differently, ptr1 and ptr2 are derived from mutable references that were created to call as_mut_ptr, and those references are considered to be live as long as any pointer derived from them is live.

This is probably going to be surprising. Is it a problem? Tree Borrows is deliberately stricter than LLVM noalias; we want to model Rust concepts, not just the LLVM attribute, and we are hoping to get benefits even inside functions where LLVM's function-scoped noalias cannot be used.

We could attempt to allow code like the above by somehow having a notion of "end of scope" for a mutable reference, and having its no-alias requirements end completely at that point. Is that worth it? And how exactly should that work? Each reference remembers the function it was created in, and when that function ends, it ceases to impose aliasing requirements? For functions with a signature like fn(&mut T) -> &mut U, we certainly want to keep the no-alias requirement for the return reference around even after the function returned -- but maybe that can rely entirely on the caller doing a retag. The bigger concern is that this makes the function boundary very special (even more so than protectors), and the following code would still be UB:

fn main() { unsafe {
    let mut array = [0i32; 8];
    let ptr1 = &mut array as *mut _ as *mut i32;
    let ptr2 = &mut array as *mut _ as *mut i32;
    ptr1.write(0);
    ptr2.write(0);
} }

And... that seems okay? In fact I think we want that code to be UB even if (ptr1, ptr2) is returned out of the function and used in the caller (example below). That code creates two overlapping &mut, I see no good reason to allow this code. But if we reject this code, then why would we accept the variant that uses as_mut_ptr? Is it only because the &mut is now implicit, an auto-ref? Should auto-refs have weaker semantics than explicit &mut? That is insufficient, we want noalias dereferenceable for &mut self arguments. A combination of "auto-ref is somehow very weak" and "function-entry retags have an 'end of scope'" would be sufficient, though the details on what auto-refs do are fuzzy. Maybe they just don't retag at all. (Function-entry retags also don't correspond to an &mut in the source, so it could be justified that they are somehow weaker.) However I think there are plenty of cases where we want aliasing assumptions even when there was no explicit &mut, like on the references returned by get_mut-style functions. Or should we devise something specific to methods like as_mut_ptr that suppresses the implicit reborrows (and also suppresses the noalias dereferenceable, which we really don't need for these methods)? That seems rather ad-hoc though.

// Example mentioned in the text above.
// If `&mut` loses its no-alias power at the end of the function it appears in,
// this code would be allowed.
fn helper(array: &mut [i32; 8]) -> (*mut i32, *mut i32) { unsafe {
    let ptr1 = &mut *array as *mut _ as *mut i32;
    let ptr2 = &mut *array as *mut _ as *mut i32;
    (ptr1, ptr2)
} }

fn main() { unsafe {
    let mut array = [0i32; 8];
    let (ptr1, ptr2) = helper(&mut array);
    ptr1.write(0);
    ptr2.write(0);
} }

I wonder what others think, in particular in terms of which of these examples should be allowed (if any) and which not.

saethlin commented 1 year ago

We could attempt to allow code like the above by somehow having a notion of "end of scope" for a mutable reference, and having its no-alias requirements end completely at that point.

I think this is what people naively expect. And if we had this property I think it would ease some of the scenarios where we fall back to advising that people not mix references and raw pointers; this is undoubtedly good advice for avoiding UB but it is unpleasant to give up references.

Is that worth it? And how exactly should that work? Each reference remembers the function it was created in, and when that function ends, it ceases to impose aliasing requirements?

That last sentence sounds tantalizing, but I'll admit it is really just because I would like the as_mut_ptr example to be permitted. Maybe some day I'll dig up examples of people running into this in the wild.

The bigger concern is that this makes the function boundary very special

Yeah, this issue. Should we have a whole separate issue for this? I don't know how much this can be its own discussion; the aliasing optimizations implemented in C/C++ compilers seem deeply tied to function boundaries and I don't know if that is fundamental or legacy from C.

I am generally concerned about tying UB too much to function boundaries; I think it would be deeply surprising if during refactoring I found I eliminated all but one caller to a helper function and decided to manually inline it into its caller so that I could apply some simplification... then found that I had added UB. But on the other hand, I think the very existence of protectors indicates we already have the opposite refactoring hazard; adding a helper function can introduce UB. That seems more dangerous, and yet I have only a handful of examples of deallocating against a protector: https://miri.saethlin.dev/logs/alloc-stdlib/0.2.2 https://miri.saethlin.dev/logs/tinyset/0.4.15 https://miri.saethlin.dev/logs/usync/0.2.1 https://miri.saethlin.dev/logs/hamt-rs/0.3.0

RalfJung commented 1 year ago

Each reference remembers the function it was created in, and when that function ends, it ceases to impose aliasing requirements?

Ah right, under that variant naive inlining becomes unsound. That's certainly not tenable. We would at least need some way to still have the "end of scope" happen after inlining.

That last sentence sounds tantalizing, but I'll admit it is really just because I would like the as_mut_ptr example to be permitted. Maybe some day I'll dig up examples of people running into this in the wild.

We're getting the first concrete requests for optimization that even come with concrete LLVM proposals that are incompatible with as_mut_ptr. So maybe the right answer is to start finding a way to separate functions like as_mut_ptr from the majority that is completely fine with "full" mutable reference uniqueness.

the aliasing optimizations implemented in C/C++ compilers seem deeply tied to function boundaries and I don't know if that is fundamental or legacy from C.

C has restrict, which is indeed tied to function boundaries. I don't think C++ has anything like this.

LLVM has noalias infrastructure that I haven't understood in detail yet. But I think it boils down to statically marking each load/store as being within some "noalias scope". The regular function-argument-level noalias is then simply equivalent to putting all loads/stores inside the function into its scope, but smaller scopes can be had within a function. In Rust this would correspond, for each &mut, to statically determining which loads/stores are "in the scope" of that borrow, and mark them appropriately. What SB and TB are doing is to dynamically determine that scope to last until "the last time that reference (or pointers derived from it) is used". But a noalias scope in LLVM cannot leave the function it starts in (I think).

However LLVM is also getting new noalias infrastructure and I know even less about how that works.^^

But no other language I know permanently remembers the full tree of how pointers get derived even as the control flow moves back up the call stack. When it's all safe code I think it makes perfect sense to keep that tree, but for raw pointers... well really it's just functions like as_mut_ptr where this makes me uncomfortable and really those functions should just tell the compiler "don't create fresh references here".

CAD97 commented 1 year ago

well really it's just functions like as_mut_ptr where this makes me uncomfortable and really those functions should just tell the compiler "don't create fresh references here".

How much would be solved by as_mut_ptr being autoref-compatibly defined as fn(*mut self) -> *mut T instead of fn(&mut self) -> *mut T, with autoref doing an addr_of_mut!?

If we can solve most of the unexpected UB issues with that sort of change to the language and stdlib, I think that's preferable to weakening &mut as posited in the OP.

thomcc commented 1 year ago

I think it would have to still be technically &mut, for backwards compatibility reasons. But perhaps it could be a weaker &mut somehow. I have long suspected there are not too many functions that would need this property, aside from slice as_ptr/as_mut_ptr/len

RalfJung commented 1 year ago

How much would be solved by as_mut_ptr being autoref-compatibly defined as fn(mut self) -> mut T instead of fn(&mut self) -> *mut T, with autoref doing an addr_of_mut!?

AFAIK those kind of functions are the main motivation for not having spurious writes on &mut.

However the function might look more like

fn as_mut_data_ptr(&mut self) -> *mut Data { addr_of_mut!(self.data) }

and annoyingly the version of that that takes a raw pointer would have to be unsafe.

And there is the question whether we want to force users to write such "raw pointer getters" in this particular style -- that does sound like quite the footgun.

I think it would have to still be technically &mut, for backwards compatibility reasons.

Yeah I guess let f: fn(&mut _) -> _ = <[i32]>::as_mut_ptr still has to work.

RalfJung commented 7 months ago

Another way to phrase this would be: under SB and TB, a pointer P is "live" as long as any pointer derived from it is live -- even if the function that created P has long since returned.

To me that seemed like the most obvious way to operationalize what the borrow checker is doing. But it is indeed not how noalias works in LLVM. OTOH, this definition is much less tied to having some idea of scope or function boundary, and thus may help us when dealing with inlining. So, it's not clear to me that going for a weaker model with some (statically determined) notion of "scope" is a good idea.

RalfJung commented 7 months ago

We could attempt to allow code like the above by somehow having a notion of "end of scope" for a mutable reference, and having its no-alias requirements end completely at that point.

I think this is what people naively expect.

Here's an example of an optimization that was just brought up in a discussion as "example of an optimization Rust should be able to do one day".

I am fairly sure that in a function like fn nocapture(&mut T), when that function returns we want to consider that reference dead. Or, if we can't use the function boundary (it's unclear how to use that in general), then we should use other clues that the lifetime is over, such as a write to the original referent.

Of course, unlike as_mut_ptr, this function doesn't return anything, so it "obviously" should not extend the lifetime of the pointer. But how obvious is that? If we say that fn(&mut T) -> *mut T returns a pointer that allows arbitrary aliasing with the original referent, then what about fn(&mut T, &mut T) -> *mut T -- are now both pointers potentially aliased, or only the one that actually gets returned? But what does "returned" mean? What if it is returned in a larger compound value, maybe even behind a pointer indirection? Maybe someone returns a pointer via destination-passing style (out: *mut Ret arguments), so the function actually has return type (). Is that where we draw the line? This is a rabbit hole that I am not sure has a bottom...

OTOH, the current model is very simple: we always track the ancestry of all pointers, and use that to determine conflicts. &mut creates child nodes, but raw pointers are "clones"/aliases of the references they come from. The only thing that makes this not simple is all the implicit references Rust creates... but I worry that fixing that with bespoke rules for when a function is nocapture vs when it is as_mut_ptr is overall going to be more complicated and harder to predict than dealing with these implicit references.

dfoxfranke commented 3 months ago

Here's the intuitive model that I held until yesterday when I started reading about stacked borrow semantics. Every reference to a T owns a lock of sorts on the T's memory, and the lock is released when the reference is dropped. Any time you read or write a through a pointer to a location that a mutable reference has a lock on, or write through a pointer to a location that a shared reference has a lock on, that's potential UB. The only exception is when you use the pointer in exactly some way that you could have just used a reference and are following all the same rules that the borrow checker enforces on references. Basically, any memory that's either owned on the stack or accessible through a reference, the compiler has expectations about, and it's responsibility of any access performed through a raw pointer not to violate those expectations. But, when memory is only reachable through (non-Unique) pointers, it's legal to use those pointers in any way that would be legal in C. A *const pointer can get invalidated if you write to a *mut pointer which aliases it, but any *mut pointer which was valid at birth remains valid for as long as the memory which it points to remains allocated.

The stacked/treed borrow rules screw up this intuitive model by making it possible for a *mut to get invalidated by writing through a &mut, or even by writing through another *mut which was derived from a &mut. I think any ruleset which has this property is far too dangerous, and whatever optimizations it might enable are not remotely worth the UB that they're going to cause.

RalfJung commented 3 months ago

The stacked/treed borrow rules screw up this intuitive model by making it possible for a *mut to get invalidated by writing through a &mut

That's unfortunately fundamentally necessary, so I'm afraid your mental model is a bit too optimistic here.

or even by writing through another *mut which was derived from a &mut

That, too, is fairly fundamental -- avoiding this means we can basically entirely ditch the idea of reference-based optimizations entirely. Consider:

fn foo(x: &mut i32, y: &mut i32) {
    let xraw = x as *mut i32;
    let yraw = y as *mut i32;
    // If the two pointers alias, this is UB: the write to xraw invalidates yraw.
    xraw.write(0);
    yraw.write(0);
}

This issue is about the fact that Stacked Borrows and Tree Borrows are more restrictive than LLVM noalias, but the points you rise apply even to LLVM noalias (and to C restrict). It is almost certain that code like the above will be UB.

dfoxfranke commented 3 months ago

I don't think I understand why the example you've written poses a problem. In C, it's UB if arestrict pointer is aliased by another restrict pointer, but it's legal for a non-restrict pointer to alias a restrict pointer. Rust has no need for a restrict keyword, because mutable references are in effect always restrict: two mutable references with overlapping lifetimes can never alias each other. So this code snippet is the moral equivalent of

void foo(int *restrict x, int *restrict y) {
    int *xraw = x;
    int *yraw = y;

   *xraw = 0;
   *yraw = 0;
}

Here if x aliases y, that's instant UB, even if the function body is empty. The write to xraw isn't what's invalidating yraw, because it was already invalid from birth. C programmers don't need consider the heredity of xraw and yraw to reason about a function like this. If yraw was valid when it was last assigned, it's still valid later, unless something deallocated the memory it points to.

C's rules about restrict pointers seem sufficient to permit plenty of optimization, even when they have non-restrict pointers aliasing them. For example, I wrote

void foo(int *restrict x, int *restrict y) {
    int *xraw = x;
    int *yraw = y;

   *xraw += 1;
   *yraw += 1;
   *xraw += 1;
}

and looked at how this is assembled on aarch64. I got just the result I expected:

        ldr     w8, [x1]
        ldr     w9, [x0]
        add     w8, w8, #1
        add     w9, w9, #2
        str     w8, [x1]
        str     w9, [x0]

Clearly clang or LLVM was able to deduce that xraw doesn't alias yraw even though they aren't directly declared as restrict, because otherwise the optimization you see was performed here wouldn't be sound. So I'm not clear on why Rust would need to have any more UB than C does in order to be able to exploit noalias.

On the other hand, here's a case where Rust could have more UB and I wouldn't find it perverse:

fn foo(x: &mut i32) -> i32 {
    let xraw = x as *mut i32;
    unsafe { xraw.write(0); }
    *x
}

The equivalent in C with int *restrict x would be fine, and indeed the restrict would be meaningless since there are no other restrict arguments. But I'm okay with this Rust program being UB, because it's writing to a pointer which aliases a mut reference while that reference is still alive, and in a way that the borrow checker wouldn't permit if the pointer were replaced by another mut reference. I expect UB out of that.

chorman0773 commented 3 months ago

I don't think I understand why the example you've written poses a problem. In C, it's UB if arestrict pointer is aliased by another restrict pointer, but it's legal for a non-restrict pointer to alias a restrict pointer. Rust has no need for a restrict keyword, because mutable references are in effect always restrict: two mutable references with overlapping lifetimes can never alias each other. So this code snippet is the moral equivalent of

This is an incorrect assessment of the semantics of restrict. restrict pointers exclude all other accesses to the objects its used to access (except insofar as read accesses do not exclude other pointers from performing read accesses).

Per N3301 (latest draft for C23),

`6.7.4.1 #9` An object that is accessed through a restrict-qualified pointer has a special association with that pointer. This association, defined in 6.7.4.2, requires that all accesses to that object use, directly or indirectly, the value of that pointer.

6.7.4.2 2. If D appears inside a block and does not have storage class extern, let B denote the block. If D appears in the list of parameter declarations of a function definition, let B denote the associated block. Otherwise, let B denote the block of main (or the block of whatever function is called at program startup in a freestanding environment). 3. In what follows, a pointer expression E is said to be based on object P if (at some sequence point in the execution of B prior to the evaluation of E) modifying P to point to a copy of the array object into which it formerly pointed would change the value of E. "based" is defined only for expressions with pointer types. 4. During each execution of B, let L be any lvalue that has &L based on P. If L is used to access the value of the object X that it designates, and X is also modified (by any means), then the following requirements apply: T shall not be const-qualified. Every other lvalue used to access the value of X shall also have its address based on P. Every access that modifies X shall be considered also to modify P, for the purposes of this subclause. If P is assigned the value of a pointer expression E that is based on another restricted pointer object P2, associated with block B2, then either the execution of B2 shall begin before the execution of B, or the execution of B2 shall end prior to the assignment. If these requirements are not met, then the behavior is undefined. 5. Here an execution of B means that portion of the execution of the program that would correspond to the lifetime of an object with scalar type and automatic storage duration associated with B.

In respect to defined behaviour, TB mutable references are actually weaker in many (but not all) ways than restrict. restrict applies until the end of the lexical scope of the pointer it is applied to, mutable references aliasing rules apply until last use (except in a function parameter, where protectors get involved)

RalfJung commented 3 months ago

Here if x aliases y, that's instant UB, even if the function body is empty

No, that's not correct. restrict is about the accesses performed through the relevant pointers (and all pointers derived from it). The pointers may be equal as long as the accesses performed through them are disjoint. Furthermore, he accesses may even overlap if they are all read accesses.

I suggest you read the relevant parts of the C standard; it doesn't say what you seem to think it says.

it's legal for a non-restrict pointer to alias a restrict pointer.

This is wrong, too. This example has UB in C if the pointers alias:

void foo(int *restrict x, int *y) {
   *x = 1;
   *y = 1;
}

I don't think I understand why the example you've written poses a problem.

"problem" in which sense? You don't understand why it is UB, or you don't understand what it has to do with your argument? Your argument was that it is bad that "it possible for a *mut to get invalidated by writing through [...] another *mut which was derived from a &mut". My example shows some code where exactly that happens and yet we unquestionably want that code to be UB. Therefore, it shows that your intuition doesn't hold up even for the most basic aliasing rules.

If you have further questions about why certain examples are UB, I'd ask you to take that to a new thread, so that we can in this thread collect arguments for and against the question stated in the OP, without filling it with misunderstandings about the basic concepts being discussed here. We have a Zulip stream where we're always happy to answer questions, or you can open a new issue here with a concrete question if you prefer Github.

RalfJung commented 3 months ago

But I'm okay with this Rust program being UB

There is no world in which it is okay for this program to be UB. You can write basically the exact same thing in safe code:

fn foo(x: &mut i32) -> i32 {
    let x_notraw = &mut *x;
    *x_notraw = 0;
    *x
}

Obviously this code is fine, and therefore obviously the code you wrote should also be fine.

I am not sure what your intuition is based on, but it doesn't seem to be anywhere close to how Rust references work, so it is not surprising that your intuition would then clash with Stacked Borrows and Tree Borrows that are closely built on the way Rust references work.

JoJoDeveloping commented 2 months ago

That last sentence sounds tantalizing, but I'll admit it is really just because I would like the as_mut_ptr example to be permitted. Maybe some day I'll dig up examples of people running into this in the wild.

I have now come across a few such examples. For example, this code here uses it and trips up Tree Borrows (although not Stacked Borrows). Another one is this here (since fixed). Unfortunately, these issues in the wild are a bit more complicated, with there being several newtype wrappers and the proper way of fixing it being that in addition to exposing everything as a &mut [T] everywhere, you also need to preserve ways of getting to the raw pointer. (I guess such code is quite rare, but I just spend several days looking at people breaking TB rules in seemingly reasonable ways that are fixable, but the fixes make the code more ugly, which is a very disappointing state of affairs).

If we only look at slice::as_mut_ptr, then people learning Rust might look at the method, and see that it is simply a wrapper around a rather ugly cast. Unfortunately, this then introduces aliasing constraints into their code, which can be solved by removing the fancy abstraction and doing the raw casts manually. Unfortunately, that state of affairs is a bit unfortunate; the aliasing model and other restrictions of Rust force you to either make your code be nice and maintainable or make it correct, but not both, and that's not optimal.

Even further, I fear that in general, slice::as_mut_ptr is a massive footgun because it seems to simply "convert" to a pointer, and because it seems to suggest that one can build useful APIs based on slices. But this is not what it does, and you in general can not use slices anywhere raw pointers are used, because as_mut_ptr and the act of returning them imposes extra aliasing constraints. I feel like this should at least be mentioned somewhere in the documentation of that method, like so:

Note that the returned pointer is not allowed to alias with the mutable reference to the slice. 
As long as the pointer is in use, any other reference not derived from the *result* of
 as_mut_ptr should not be used.

saethlin commented 2 months ago

Unfortunately, this then introduces aliasing constraints into their code, which can be solved by removing the fancy abstraction and doing the raw casts manually. Unfortunately, that state of affairs is rather silly. I don't want to go around evangelizing people to use a language where the aliasing model combined with other language restrictions makes it impossible to write clean code since you're not allowed to factor things into methods.

I agree with your assessment. I've had the same thoughts for a while, but I think this ship already sailed long long ago. The idea that function parameters are magic and have extra UB is fundamental to deferenceable and noalias; if they can insert speculative reads/writes upon function entry, factoring out any code into a helper function can add UB.

JoJoDeveloping commented 2 months ago

What's so frustrating about slice::as_mut_ptr is that you would really want this method to be written like this:

pub fn as_mut_ptr(*mut self) -> *mut T [ //take a raw pointer
  self as *mut [T] as *mut T
}

But sadly, Rust does not let you do it without some unstable feature (and it would probably break some existing code). And in general, in other abstractions that want to give out raw pointers without unnecessarily modifying the provenance, this would not work as simply. Consider

struct DataWithHeader<T: ?Sized>{
  header: Foo,
  data: T
}

impl <T: ?Sized> DataWithHeader<T> {
  pub fn as_mut_ptr(&mut self) -> *mut T {
    &mut self.data
  }
}

That function is impossible to write when it takes a raw pointer, except if one makes the function unsafe, or if one uses ptr::wrapping_add (which loses optimizations..?).

A hacky solution to this is inventing an attribute #[no_retag] (which, I am sure, people had come up with before), that allows such a functions to be written in safe code, but without treating the references as references in the aliasing model, and without adding noalias and friends in LLVM. And then you can hopefully add this in enough places (like the Deref trait when also used from unsafe code, or most as_mut_ptr functions) that things just work.™ Arguably, for half the methods that currently cause headaches, this is the type people had in mind when they designed them.

So in the end, it all comes back to raw pointer ergonomics. If we want references to have very strict rules, the language should not box (no pun intended) you into using them all the time. What further causes problems is that many crates out there don't add enough "unsafe accessors" to give you raw pointers, instead all you get is a slice and then it becomes impossible / very tedious to use that dependency in a sound way for your unsafe code. What you would need is a raw pointer, but obtaining that is impossible / very fragile even if you know what you are doing. Which again comes back to raw pointer ergonomics, if there was a AsRawPtr trait widely used in Rust, this would remind people they need to add something like that.