rust-lang / reference

The Rust Reference
https://doc.rust-lang.org/nightly/reference/
Apache License 2.0
1.24k stars 488 forks source link

Document pattern evaluation order #1665

Open zachs18 opened 2 weeks ago

zachs18 commented 2 weeks ago

For most patterns and in safe code, "evaluation"(/matching?) order of subpatterns does not matter, but there is (that I can think of) one instance on stable where pattern evaluation order matters: matching on a struct with a tag and a union field (and similar situations).

The example in the section on struct patterns explicitly states that field order does not matter, but it also does not have any patterns where the evaluation order of subpatterns would matter.

```rs match s { Point {x: 10, y: 20} => (), Point {y: 10, x: 20} => (), // order doesn't matter Point {x: 10, ..} => (), Point {..} => (), } ```

The section on unions does mention pattern matching, but does not say anything about pattern evaluation order. It gives an example of pattern-matching on a manual tagged union, though pattern evaluation order does not matter for the example given[^2]. In a slightly different example, however, the field order does matter:

the example ```rs #[derive(Clone, Copy)] enum Tag { A, B, } #[derive(Clone, Copy)] #[repr(C)] union Value { a: u32, b: u8, // note that b is smaller than a } /// Assume that if tag == Tag::A, then val.a is valid, and if tag == Tag::B, then tag.b is valid. #[derive(Clone, Copy)] struct Tagged { tag: Tag, val: Value, } unsafe fn tag_first(v: Tagged) -> bool { match v { // fine under miri with tag == B, sees that `tag != A` and skips the arm Tagged { tag: Tag::A, val: Value { a: 0 } } => true, _ => false, } } unsafe fn val_first(v: Tagged) -> bool { match v { // error under miri with tag == B, since it reads the padding bytes after `Value::b` Tagged { val: Value { a: 0 }, tag: Tag::A } => true, _ => false, } } fn main() { let v = Tagged { tag: Tag::B, val: Value { b: 0 }, }; unsafe { tag_first(v); val_first(v); } } ```

For unstable code, I suppose deref_patterns might also make it important to document pattern evaluation order, or maybe that feature is/will be restricted enough for it not to matter. Depending on the resolution of https://github.com/rust-lang/unsafe-code-guidelines/issues/412 pattern evaluation order might be important if matching on references-to-references-to-invalid-data (miri example)?

I'm not sure if this is fully the intended behavior[^1], or if it is intended, how best to document it.

[^1]: Alternately, instead of documenting pattern evaluation order, it could be specified that if any (union) field used in a pattern match is invalid/uninitialized, then the whole arm is UB, regardless of the order the fields were written in the pattern.

[^2]: in that example, the union field is fully-initailized either way, or UB happens regardless of pattern evaluation order

workingjubilee commented 2 weeks ago

cc @Nadrieril I suspect this question can by answered by the curse blessing of your knowledge.

Nadrieril commented 2 weeks ago

Hehe let me propagate the curse. Fun fact, this trips Miri (because we sort or-patterns last for perf reasons):

unsafe fn tag_first(v: Tagged) -> bool {
    match v {
        Tagged { tag: Tag::A | Tag::A, val: Value { a: 0 } } => true,
        _ => false,
    }
}

and this doesn't (because once we match a variant, we add its subpatterns at the end of the current list of things-to-test):

#[derive(Clone, Copy)]
struct Tagged {
    tag: Tag,
    val: Option<Value>,
}

unsafe fn val_first(v: Tagged) -> bool {
    match v {
        Tagged {
            val: Some(Value { a: 0 }),
            tag: Tag::A,
        } => true,
        _ => false,
    }
}

Today's implementation is rougly top-to-bottom & left-to-right for simple patterns, except nested patterns make the whole thing complicated, and we establish bindings only after the whole arm successfully matches, and we often test the same place many times in a way that's tricky to specify, and we sort or-patterns after other patterns for efficiency.

In short: current status would be a mess to specify. As you correctly point out, deref patterns make that more observable i.e. worse.

My understanding of the current guarantees is: there are none. We don't even specify what memory a pattern accesses. I would like that to change.

Here would be my starting point:

For your case about unions, this is the maximally-UB option.

From this starting point:

(thanks for raising this btw, it'll be good to get this ironed out)

RalfJung commented 2 weeks ago

Cc @rust-lang/opsem

RalfJung commented 2 weeks ago

Deref patterns with user-provided types make the order even more observable, and in that case top-to-bottom & left-to-right is not something we can reasonably guarantee.

I would have said in that case top-to-bottom & left-to-right is not something we have a reasonable alternative to. Rust has systematically avoided unspecified or non-deterministic evaluation order as those are a terrible footgun in C/C++. So I would argue that we should specify top-to-bottom & left-to-right, and it is up to the pattern compilation engine to detect cases where the order is not observable and then exploit that for performance.