rust-lang / unsafe-code-guidelines

Forum for discussion about what unsafe code can and can't do
https://rust-lang.github.io/unsafe-code-guidelines
Apache License 2.0
664 stars 57 forks source link

Do aliasing requirements carry over from const-time to run-time? #424

Open RalfJung opened 1 year ago

RalfJung commented 1 year ago

If a pointer is created as read-only during one program execution stage, is that restrictions inherited by future stages? IOW, is this code UB?

struct Foo(*mut i32);
unsafe impl Sync for Foo {}

static mut FOO: i32 = 0;
static PTR: Foo = Foo(unsafe { &FOO as *const i32 as *mut i32 });

fn main() {
    unsafe {
        let _r = &mut *PTR.0;
    }
}

If this would be running in a single instance of the AM then clearly this is UB. But does the alias information carry over from the compiletime AM to the initial state of the runtime AM?

Maybe thr answer is "obviously yes", but it doesn't seem so obvious for me. First of all, we'd need to set up extra machinery to even be able to put such alias information into the initial state of the runtime AM (i.e., even once Stacked/Tree Borrows are implemented for MiniRust, this won't be UB in MiniRust since the initial value of a "global" can't express that PTR is read-only). We could of course add that machinery. Then the question becomes: what is the benefit of doing so? We need aliasing restrictions to perform analyses and reasoning on references that alias with unknown code; with the results of const-eval being completely known to later stages, there's no clear benefit. Sure we can construct artificial examples, but is it worth it?

digama0 commented 1 year ago

My initial reaction is indeed "obviously yes". I think it is less about the read-only-ness itself but rather aliasing information in general: what kind of provenance do pointers carry over across the const-time -> run-time barrier? The natural answer is "all of it", in which case we get this behavior (the code is UB). The alternate natural answer, "none of it", means that these pointers must go through some kind of exposing process to be usable at all, which might cause other undesirable consequences.

Just to gauge where MiniRust and Miri sit on this spectrum, what happens if you use instead:

static PTR: Foo = Foo(std::ptr::invalid_mut(&FOO as *const i32 as usize));

(To answer my own question: it's currently disallowed because as usize fails.)

Lokathor commented 1 year ago

Given that const values in rust are often described as / taught as being "substitute the const expression into each place you use it", I would say that the program would be assumed by a normal user to have UB. If you copy the expression that makes PTR into where PTR is used within main then it becomes clear that a shared reference is being turned into a mutable reference via pointer cast shenanigans, which is a classic form of UB in rust.

That said, I've heard some experts push back against the "textually replace" model for consts, so grain of salt and such. Still, my line of thinking would also point to "obviously yes" being the answer.

RalfJung commented 1 year ago

Sure, for an easy explanation is seems best to make this UB.

What I am wondering about is whether that should be just an "explanation model" or whether it is worth going through the effort of actually making this UB.

(In Miri it will be architecturally hard to make this UB, but that is not my concern. It will be architecturally hard to even detect aliasing violations within a const eval execution.)

digama0 commented 1 year ago

Without putting too fine a point on it, while I appreciate having Miri there to help check for UB, I don't like the implication that implementation details of Miri should drive the definition of UB. Unless there is a reason that this is not implementable, I think we should be allowed to keep the focus on the theoretical model and explaining that (or simplified versions of it). Put another way, I would not expect an implementation in Miri to be a blocking concern in FCP for aspects of the formal model. (Note that one could very well say the same thing about stacked borrows in the first place: it is architecturally hard to make SB aliasing violations detectable, but we did it anyway.)

RalfJung commented 1 year ago

I explicitly said Miri is not my concern. How do you read that as implying that Miri should drive the def.n if UB? I mentioned this because people might be inclined to probe Miri and I didn't want anyone to reach false conclusions.

digama0 commented 1 year ago

Maybe you could clarify this sentence then?

It will be architecturally hard to even detect aliasing violations within a const eval execution.

Detect in what? Minirust, Miri or rustc const eval? (Or on paper.)

chorman0773 commented 1 year ago

Turning a promoted &T at const time to &mut T at runtime definitely cannot be allowed (even if indirectly doing so, say through a raw pointer). Given

pub const FOO: &i32 = &0;

I'd full expect to be allowed to place *FOO in a read-only segment, and I don't see a difference with *const i32 (or even *mut i32) starting from a &i32. The same is true of a reference to a non-mutable impl Freeze statics.

In the case of starting with a static mut, I think there is some model simplicity benefits to &FOO in a const-expr having one implication at runtime, regardles of what FOO itself is. I don't know is there is necessarily any significantly useful optimizations to prohibiting the case when starting from a mutable static or a impl !Freeze static, but it seems to me like that would involve special-casing them in the model.

CAD97 commented 1 year ago

@chorman0773 — your example is different, as it's purely using consts as opposed to statics. I believe everyone agrees that consts (including promoteds) are restricted to being immutable (until copied out ofc).


I apparently was mistaken, since this example compiles on stable, but I previously thought that reference-to-mutable was necessarily forbidden from escaping const evaluation and making it to a static/const item. It is the case that const items can't reference static items currently, but it's fine for statics to reference mutable statics, apparently.

(Because of this, it's possible that the ordering of evaluation of statics' constant initializers could impact validity, since it could change the end borrow tree structure. Or even worse, as we enable nonescaping reference-to-mutable within const, potentially impact the results; we'll need to be very careful there.)

From an opsem specification perspective, I would expect that "yes, this causes UB" is the easier answer, in addition to it likely being more intuitive. For it to be UB, all you need to say is that the borrow tree state carries over from const evaluation to runtime.

For the answer to be "no, this isn't UB," there needs to be some sort of "serialization" stage between const evaluation and runtime evaluation, where pointers/references in statics are laundered to all be addr_of_mut! style "same tag" provenance. This is an extra step compared to letting it be UB.

Mitigating the extra step, though, is that we do necessarily already have an observable split between const evaluation time and runtime, since const panics block compilation up front instead of happening at runtime when encountered.

chorman0773 commented 1 year ago

your example is different, as it's purely using consts as opposed to statics.

I was building up the "obvious" example. By my intuition, there isn't any difference between the two - both generate a Frozen tag at const time, and yield a pointer with that tag to runtime. I don't see a difference in how that tag comes about w/o introducing special semantics, which I'm unsure are necessary.

Or rather, I believe the best way to ensure that a promoted const/immutable static can't be mutated through a pointer is to simply carry the TB/SB tag from const eval into the value of the const/static initialized with it.

RalfJung commented 1 year ago

Maybe you could clarify this sentence then?

It will be architecturally hard to even detect aliasing violations within a const eval execution.

Detect in what? Minirust, Miri or rustc const eval? (Or on paper.)

Detect in Miri. But as I said that is not my concern for this issue.

For MiniRust we don't have a plan yet for how to evaluate constants ourselves (rather than have rustc eval them), but that's what we will have to do and then non-aliasing within a const eval instance is easily enforced. The trouble is the next step: we produce a 'global' to be used by runtime code (and even other const eval), where we can only symbolically refer to other globals. Currently this happens via "globalid-offset" pairs. To enforce aliasing across the boundary between different interpreter instances we'd have to also put stack/tree data into this. We'll need "symbolic pointer tags" (to be turned into concrete tags when the AM is initialized) and a bunch of surrounding infrastructure. It's not impossible but it's a lot of machinery, for questionable benefit.


@chorman0773 I view your example as very different. We will have a notion of when these "globals" are read-only (meaning they can be put into read-only memory and any write access is UB); we "just" need to figure out the exact rules for when they are marked as read-only, but for promoteds it seems pretty clear.


From an opsem specification perspective, I would expect that "yes, this causes UB" is the easier answer, in addition to it likely being more intuitive. For it to be UB, all you need to say is that the borrow tree state carries over from const evaluation to runtime.

This is very much not the easier answer. You cannot just carry over that state: imagine 2 constants being evaluated independently, and now at runtime we use both of them -- they might have used the same borrow tag, so we need to re-map all borrows tags to ensure they remain distinct! We do this for allocation IDs via symbolic "global IDs" in MiniRust; we'd need all the same infrastructure for borrow tags as well.

So carrying over the tag is definitely a non-starter. The only option is to build complicated machinery that can reconstruct an isomorphic (but not identical) state when the runtime AM starts. We should have good motivation for going through this. Just specifying when a piece of global memory is read-only is a lot simpler.

digama0 commented 1 year ago

I think the answer to this question is blocked on some bigger design questions:

Trying to answer the question about how aliasing data is reinterpreted at runtime seems quite far along in the tech tree compared to these more basic questions that we still don't have a solid model for, so I would prefer to shelve the question until then, so that we can have a framework in which to consider what the options even are.

RalfJung commented 1 year ago

Agreed, it probably makes sense to start with the broader framework here. I have thoughts on that but haven't ever really written them down, and no idea for what part of that we have consensus.