rust-lang / unsafe-code-guidelines

Forum for discussion about what unsafe code can and can't do
https://rust-lang.github.io/unsafe-code-guidelines
Apache License 2.0
665 stars 58 forks source link

Differences between `*const T` and `*mut T`. Initially `*const T` pointers are forever read-only? #257

Open thomcc opened 3 years ago

thomcc commented 3 years ago

I hadn't seen this but it's very surprising and should be documented better. Apparently, *mut T and *const T aren't equivalent — if the raw pointer starts out as a *const T it will always be illegal to write to, even nothing beyond the pointer's "memory" of its initial state is the reason for this.

See: https://github.com/rust-lang/rust-clippy/issues/4774#issuecomment-550008705

This is not entirely correct... something like &mut foo as mut T as const T as mut T is entirely harmless. What is relevant is the initial cast, when a reference is turned to a raw pointer. I think of the pointer as "crossing into another domain", that of uncontrolled raw accesses. If that initial transition is a const, then the entire "domain" gets marked as read-only (modulo UnsafeCell). The raw ptrs basically "remember" the way that the first raw ptr got created.

This is extremely surprising, as lots of documentation and common wisdom indicates that *const T vs *mut T are identical except as a sort of lint, and that the variance is different.

In fact, often having correct variance in your types often forces using *const even for mutable data (hence NonNull uses const). The prevalence of this certainly helps contribute to programmers belief that there's no meaningful difference, you just have to be sure the data you write to is legal for you to write to.

A common case where this happens is if you write a helper method to return a pointer, you might write this just once for the *const T case, and use it even if you're a &mut self and need a *mut T result. I wouldn't think twice about this, mostly because the myth they're equivalent is so widespread

Ralf's comment here even further propagates this myth, in a thread explicitly asking about the differences here... https://internals.rust-lang.org/t/what-is-the-real-difference-between-const-t-and-mut-t-raw-pointers/6127/18 :

I agree. const T and mut T are equivalent in terms of UB.

More broadly, nowhere in the thread does the 'once *const, always *const' behavior come up, just that you need to make sure that you maintain the normal rust rules (e.g. the rules which would apply had you started out with a *mut T).

I looked and in none of Rust's reference material could I find any mention of behavior like this. This is very surprising, and I had been under the impression that optimising accesses to raw pointers wasn't beneficial enough for Rust to care strongly about them.

I also think this breaks a lot of existing unsafe code given how widespread the belief that they are equivalent is, and makes non-const-correct C libraries much thornier to bind to :(

elichai commented 3 years ago

if the raw pointer starts out as a *const T it will always be illegal to write to

More broadly, nowhere in the thread does the 'once const, always const'

These statements aren't true. the only thing that matters is how did you get the pointer, for example this is 100% correct rust code:

let mut v = 5u8;
let ptr: *const u8 = unsafe {std::mem::transmute(&mut v)};
unsafe {*(ptr as *mut u8) = 7;}
assert_eq!(v, 7); 

So even though it started as a *const u8 you're still allowed to write into it, because you got the pointer from a unique(mut) reference and not a shared reference

RalfJung commented 3 years ago

This is, I think, a duplicate of https://github.com/rust-lang/unsafe-code-guidelines/issues/227. I agree it is a problem. I just do not know a good solution.

So even though it started as a *const u8 you're still allowed to write into it, because you got the pointer from a unique(mut) reference and not a shared reference

The subtle aspect of this is that x as *const _ is basically the same as &*x as *const _, i.e., as *const _ always goes through a shared reference.

elichai commented 3 years ago

The subtle aspect of this is that x as *const _ is basically the same as &*x as *const _, i.e., as *const _ always goes through a shared reference.

Ohhh that's what he was talking about, I'm sorry I misunderstood you @thomcc

mversic commented 3 years ago

I hope I'm not off topic. I don't think it is, since NonNull is a wrapper over *const

NonNull documentation states:

Notice that NonNull has a From instance for &T. However, this does not change the fact that mutating through a (pointer derived from a) shared reference is undefined behavior unless the mutation happens inside an UnsafeCell. The same goes for creating a mutable reference from a shared reference. When using this From instance without an Unsaf:eCell, it is your responsibility to ensure that as_mut is never called, and as_ptr is never used for mutation.

I've also found this post which basically asserts that it is entirely ok to use NonNull in FFI. If pointer is nullable then I see no special benefit in using Option<NonNull<T>>, I would just use *mut T. However, I'm interested to use NonNull<T> for non nullable pointers(i.e pointers for which C documentation explicitly states null value must not be provided) as it would provide additional type safety. This is just for the scenario where Rust code is calling C code, not vice versa.

And, now I'm confused :)

mversic commented 3 years ago

hm, we could say that this isn't a NonNull related issue. We can still have the same issue in FFI in this scenario:

let x = [1, 2, 3];
let y = c_fun_which_takes_ptr_and_mutates_it(x.as_ptr() as *mut _)?;

this is said to be a UB as well

therefore I find that using NonNull<T> is as good as using *mut T considering the risk of UB. Maybe it's a little better since documentation states the risk of UB

Diggsey commented 3 years ago

AIUI, this actually has little to do with *const vs *mut and is about whether the pointer provenance is a & or a &mut (or no provenance).

The only tricky part is the point @RalfJung mentioned when casting directly from a &mut to a *const where a reference is implicitly created.

One option would be to warn on this direct cast (&mut -> *const) (in the next edition if that would be too noisy) and require that the &mut -> *mut -> *const vs &mut -> & -> *const path is explicitly distinguished.

Then you can be safe in treating *const and *mut the same.

RustyYato commented 3 years ago

Could we change &mut T as *const T to not go through a shared reference? Getting rid of the implicit footgun.

RalfJung commented 3 years ago

The tricky bit would be to keep this code working:

fn main() {
    let x = &mut 0;
    let shared = &*x;
    let y = x as *const i32; // if we use *mut here instead, this stops compiling
    let _val = *shared;
}

Currently this works because x as *const _ is considered a read-only access.

OTOH, we do reject the as *mut version of this. If we want to treat as *mut and as *const the same, accepting one and rejecting the other makes little sense.

GoldsteinE commented 2 years ago

How does addr_of!() affects this? This code:

let mut x = 0_i32;
let ptr_x: *const i32 = std::ptr::addr_of!(x);
let mut_ptr_x: *mut i32 = ptr_x as _;
unsafe { *mut_ptr_x = 2; }

creates a *const i32 without creating &i32 first and currently triggers Miri. addr_of!() documentation doesn’t mention that resulting pointer can’t be casted to *mut T and used for writes though.

RalfJung commented 2 years ago

creates a *const i32 without creating &i32 first and currently triggers Miri

Indeed, that's how it currently affects addr_of.

addr_of!() documentation doesn’t mention that resulting pointer can’t be casted to *mut T and used for writes though.

True. It also doesn't say that you can do that. The docs are not exhaustive for what you cannot do. (That would require infinitely large docs.)

There has not been a decision on what the semantics should be here, and that's why the docs basically don't talk about this. It's not great, but absent a decision it's also not clear what else to do. And making the decision without having an entire aliasing model for all the context isn't really a good idea either.

GoldsteinE commented 2 years ago

It also doesn't say that you can do that.

It’s true. The way “validity” is currently defined in the standard library docs doesn’t guarantee that any use of pointers from addr_of!() (or addr_of_mut!(), for that matter) is valid.

Is there a rationale for making addr_of!()-produced pointers invalid for writes? I think it’s kind of confusing and doesn’t match the general intuition that *const _ and *mut _ raw pointers are interchangeable.

bjorn3 commented 2 years ago

Is there a rationale for making addrof!()-produced pointers invalid for writes? I think it’s kind of confusing and doesn’t match the general intuition that *const and *mut _ raw pointers are interchangeable.

If you want to write through the pointer you would use addr_of_mut!(), right? Otherwise what is the point of having two separate macros?

addr_of!() documentation doesn’t mention that resulting pointer can’t be casted to *mut T and used for writes though.

It actually does under the examples section of the addr_of!() documentation:

See addr_of_mut for how to create a pointer to unininitialized data. Doing that with addr_of would not make much sense since one could only read the data, and that would be Undefined Behavior.

RalfJung commented 2 years ago

Ralf's comment here even further propagates this myth, in a thread explicitly asking about the differences here... https://internals.rust-lang.org/t/what-is-the-real-difference-between-const-t-and-mut-t-raw-pointers/6127/18 :

I agree. const T and mut T are equivalent in terms of UB.

I feel quoted out of context here -- for the question raised in that particular thread, my statement holds true. But specifically when converting a reference to a raw pointer, there is a difference.

Is there a rationale for making addr_of!()-produced pointers invalid for writes?

Basically, because it matches what the borrow checker does -- see here further up this thread.

RalfJung commented 1 year ago

Some updates on this:

CAD97 commented 1 year ago

To make sure it's remembered, there is some practical justification of as *const _/addr_of! and as *mut _/addr_of_mut! behaving differently — they're treated differently by the borrow checker. The *mut version is checked as a mutable access, and the *const version as immutable.

example ```rust let mut x = &mut 5; let r = &x; let _ = x as *mut _; // ^ERROR: cannot borrow as mutable ... also borrowed as immutable let _ = x as *const _; // allowed let _ = addr_of_mut!(x); // ^ERROR: cannot borrow as mutable ... also borrowed as immutable let _ = addr_of!(x); // allowed dbg!(r); ``` ------

This doesn't mean that the opsem has to match this and create a pointer with shared provenance for the *const constructions[^but], but it does provide a potential justification.

As long as providing derived mut provenance to the pointer doesn't impact the validity of extant provenance until the pointer is accessed, though, I agree that the more permissive model of giving the mut provenance when possible is desirable. (If the more permissive semantics are a pessimization to some code, it can probably be rewritten to introduce a shared reborrow and limit the provenance explicitly. Plus, managing two distinct simultaneously valid sibling raw provenances (one mut and one shr) seems like a nightmare.)

[^but]: At least at some point, the compiler interpreted type_ascribe!(x, &mut _) as *const _ as going through an intermediate coercion to &_ which does limit to shared provenance while that's still the case.

RalfJung commented 1 year ago

I opened https://github.com/rust-lang/unsafe-code-guidelines/issues/400 for the specific question of whether let-bound variables should be UB to mutate.

JakobDegen commented 1 year ago

@RalfJung the one thing I will point out here is that it does not apriori have to be the case that for r: &mut u8, r as *const _ and addr_of!(*r) have to do the same thing. Maybe they should, but I wouldn't be terribly shocked if the slightly different syntax led people to have different expectations

RalfJung commented 1 year ago

I think it would be very surprising if those two ways of turning a mutable ref into a raw ptr would not do the same thing -- I feel fairly strongly they should be the same.

However I can see the question of mutation of let-bound variables being separate from that of mutating through &mut to *const-cast pointers. Hence the separate issue for the former.

saethlin commented 1 year ago

Users seem to have some kind of intuition that the expression inside addr_of{_mut}! is some kind of special context that provides waves hands simpler/less-UB semantics. I think this is a UI issue with it being a macro instead of what it expands to. I think it is quite important that we eventually deprecate the macro and have an operator that does the job (like &raw), it would be a great shame if we acquire baggage due to the way we got to a stabilized &raw.


In the indefinite future, we should have a stable #![no_core] and when that is stable, not having access to the addr_of! semantics in it may be acutely painful; addr_of! is exactly the flavor of low level operation I expect to be common in core-less code.

CAD97 commented 5 months ago

Two small potential arguments for addr_of! not providing write-capable provenance:

For full clarity, I am fully in support of preferring &mut place as *const _ being ptr::from_mut(&mut place).cast_const() and not ptr::from_ref(&mut place). (It is currently more accurately &raw const *&mut place.) This is only about addr_of!/&raw const.

And I still think I weakly favor &raw const getting write-capable provenance, because all else being equal, more things being DB and a simpler specification is better. I just think that these observations are interesting to consider.


Because OTPT is nowhere near, I think this is an argument for stabilizing &raw const and &raw mut. Once they're actually available we can see how people actually expect them to behave.

RustyYato commented 5 months ago

Closure capture rules mean that addr_of!(capture) still captures by-ref, which results in generating a write-incapable pointer as it gets derived from the ref-capture

This sounds like a foot gun. I would have expected it to be captured by raw pointer. But that seems off topic here. I'll open an issue in the main rust repo after investigating this.

chorman0773 commented 5 months ago

TBH, I wouldn't expect a whole new capture mode here.

Changing &T -> *const T would alter type checking rules in language-visible ways, not just operational semantics. It's almost certainly a breaking change (because of auto traits).

RalfJung commented 5 months ago

Closure capture is an interesting one. The consistent capture mode would be &mut, but that's probably also surprising.

But really the main point to me is that &raw const *raw_mut_pointer (and raw_mut_ptr as *const _, which compiles to the same MIR) should not lose an existing write permission -- I assume we have consensus on that? Having &raw const do different things to the permission depending on the shape of the place expression that follows is a non-compositional nightmare (and I've had to spend my share of time just dealing with that nightmare in Miri; it's particularly bad for Box).

celinval commented 4 months ago

I'm curious... what is the point of having two types *const T and *mut T if they behave the same way?

As a developer, if I call a function that takes *const T, I expect that function to never change the value of that variable, even if my original variable is mutable.

chorman0773 commented 4 months ago

Well, it's intended as an indicator to programmers mostly.

chorman0773 commented 4 months ago

Using a &mut capture would also alter well-formedness (degrade from Fn() to FnMut(), and also borrow the type mutably).

RalfJung commented 4 months ago

We also have *const i32 and *const u32 even though they behave in the same way -- or rather, opsem doesn't make a difference between them. Both the pointee type and mutability are hints for the intended use of this pointer, but not hard guarantees/constraints.