rust-lang / unsafe-code-guidelines

Forum for discussion about what unsafe code can and can't do
https://rust-lang.github.io/unsafe-code-guidelines
Apache License 2.0
669 stars 58 forks source link

Should validity of a reference depend on the *contents* of memory in any way? #414

Open RalfJung opened 1 year ago

RalfJung commented 1 year ago

This discussion presupposes that "Do we have full recursive validity for references" (https://github.com/rust-lang/unsafe-code-guidelines/issues/412) is answered "no".

As @digama0 points out, lots of weaker variants of "validity depending on memory contents" are conceivable. Probably the most strict one is "full recursive validity except inside shared UnsafeCell". We could also imagine "only follow one indirection" and all sorts of other variants.

RalfJung commented 1 year ago

My own opinion is that no, validity should not depend on the contents of memory in any way. It might (at least effectively) depend on whether there is dereferenceable memory at a given location at all (see the issues tagged A-dereferenceable), but that is all we should have.

My main argument is that I want to keep reasoning about whether code has UB as local as possible, so other threads changing the contents of memory shouldn't make a reference suddenly be invalid. Deallocation being able to invalidate references is bad enough, but that falls out of the aliasing model fairly naturally. (I view dangling references as causing UB on the next retag, not the next typed copy.) I don't see a strong enough reason to also let any write possibly invalidate references, and I don't see an elegant way that this requirement would even arise (considering a MiniRust-like framework).

(Some of) the arguments in https://github.com/rust-lang/unsafe-code-guidelines/issues/346 also apply. In particular, even 1-level validity behind &mut would make a lot of existing uses of Read UB; we should have a really good reason to break such code and a good plan to helping people find and fix such code. We did something like that with mem::uninitialized and it was not pleasant.

digama0 commented 1 year ago

How reliably will users be able to predict when retags are inserted? I think there is a conceptual / teachability advantage to having retags effectively happen "all the time", which is to say that adding a retag doesn't change the set of UB, because all UB caused by retags is already effectively implied by validity for the relevant types. This would effectively make retags irrelevant, which IMO would be a good thing since they have no concrete syntax and people will not have a good sense for when they happen.

CAD97 commented 1 year ago

AIUI, retags happen at exactly the points that references are reborrowed. Reborrows happen when performed explicitly (e.g. &[mut] *r) as well as being implicitly inserted whenever autoderef occurs (using any reference as a method receiver, passing any reference as a function parameter, or copying a shared[^1] reference).

[^1]: I'm unsure whether moving &mut results in either a reborrow (i.e. can result in a reference with a different syntactic lifetime). A quick test showed the old binding being moved from given a simple let new = old;, but being reborrowed when new is annotated as having a type requiring deref coercion from old.

To a useful first approximation, retags happen any time a (place containing a) reference is accessed. I believe it should be accurate to say that references get retagged exactly and only when they undergo a typed copy[^2]. Because of this, the borrow validity is in fact very similar to byte validity; the place it differs is that it (actually performs a memory read and) also has the concept of function borrow barriers influencing borrow validity, which doesn't have an equivalent for byte validity[^4].

[^2]: The complicated part is if/how reference fields get retagged when using a containing structure, due to this currently being ABI-dependant. IIRC, Miri currently only retags fields when an aggregate has the Scalar or ScalarPair ABI, matching when rustc emits (and is even able to[^3]) the noalias annotation to LLVM and actual LLVM UB would go undetected if the retag is skipped. This isn't super nice, since ABI of repr(Rust) is supposed to be an implementation detail and not impact semantics, but it does allow many owning_ref style objects to avoid throwing UB while there's still no language endorsed way to suppress retagging (see: the MaybeDangling RFC).

[^3]: Larger aggregates are passed by reference, and noalias is an LLVM parameter attribute.

[^4]: The "invalid not used again" examples do provide a potential motivator for byte validity having something analogous to function borrow barriers, to ensure borrows don't leave values outside the "intended" value domain that encroach on niche-populated otherwise-invalid values for a place, but that's at best a topic for that other thread.

RalfJung commented 1 year ago

We have an issue for that: https://github.com/rust-lang/unsafe-code-guidelines/issues/371

I think there is a conceptual / teachability advantage to having retags effectively happen "all the time", which is to say that adding a retag doesn't change the set of UB, because all UB caused by retags is already effectively implied by validity for the relevant types. This would effectively make retags irrelevant,

No it would not, retagging is still very relevant for aliasing requirements.

I'm unsure whether moving &mut results in either a reborrow (i.e. can result in a reference with a different syntactic lifetime). A quick test showed the old binding being moved from given a simple let new = old;, but being reborrowed when new is annotated as having a type requiring deref coercion from old.

The current status is that moving into a local will incur a retag, even if there is no reborrow. But moving into an indirection (*something = old) does not incur a reborrow. Basically the idea is, the compiler can track locals easily so we want reborrowing all the time, but we don't have to bother with it when it's behind a pointer indirection.

Miri currently only retags fields when an aggregate has the Scalar or ScalarPair ABI,

That's a temporary situation, clearly we want retagging for all fields.

RalfJung commented 1 year ago

Anyway this issue is not about retags! This issue is about whether we want to require any kind of validity of in-memory data beyond what is imposed by retags (which is just dereferencability).

chorman0773 commented 1 year ago

Anyway this issue is not about retags! This issue is about whether we want to require any kind of validity of in-memory data beyond what is imposed by retags (which is just dereferencability).

I'd argue that the only place it makes sense to assert validity is on retags and activation (except maybe #84 stuff at invalidation of certain references), so that would make retags relevant to the discussion.

RalfJung commented 1 year ago

Are you saying that retagging should do more than what is necessary for the alias model (which amounts to checking dereferencability)?

chorman0773 commented 1 year ago

Are you saying that retagging should do more than what is necessary for the alias model (which amounts to checking dereferencability)?

I'm saying that if we are requiring the validty of referents, then retags are the most logical place for them. Otherwise, if you project: &Cell<Option<T>> into Option<&Cell<T>>, which I could see as being a perfectly valid unsafe operation, we have to ask what happens to the &Cell<T> when you do orig.set(None). Under TB IIRC, It's still Reserved after the other Cell becomes Active, but if that Reserved still actively requires validity, we've introduced UB into the code by keeping the reference valid. So if that projection is sound provided the static lifetime is enforced, validity of references only can be be a finite number of times per - IMO, retags are the place that most makes sense to happen. This could just as easily be an argument against recursive validity, as well.

RalfJung commented 1 year ago

which I could see as being a perfectly valid unsafe operation,

(In general this is wildly unsafe because Cell blocks niches, so the layout could be entirely different.)

t if that Reserved still actively requires validity, we've introduced UB into the code by keeping the reference valid

I have no idea what you mean by "actively requiring validity" here or how there is UB here.

Again, this issue is about contents being valid, so the aliasing state shouldn't even come in. Is it insta-UB to have a &bool that points to 3? That's the kind of question we are considering here. Such insta-UB usually arises (in my view) from doing a typed load (converting the in-memory representation into a higher-level "value" representation). I don't think we want to have that value representation being recursive through references though.

I guess we could have retagging create such a high-level value representation of the pointee. Is that what you mean? With interior mutability however we'd have to "leave a hole" in that representation at any place where there is an UnsafeCell, so it'd not be a proper value either.

python-ast-person commented 11 months ago

Is it insta-UB to have a &bool that points to 3?

I would say yes because my definition of the validity requirement of a shared reference is that, at any point while the reference is live:

This definition would require UnsafeCell to downgrade all validity requirements to safety requirements to allow Mutex<bool> to be sound.

Nadrieril commented 8 months ago

Something I have not seen mentioned: any kind of recursive validity implies that a self-referential struct would have a coinductive validity invariant. That sounds unpleasant to model.

RalfJung commented 8 months ago

Yeah, one could use interior mutability to set up a cycle of shared references and then transmute the Cell away -- that should be sound and validity would then be cyclic.