Document pattern evaluation order

zachs18 commented 2 weeks ago

For most patterns and in safe code, "evaluation"(/matching?) order of subpatterns does not matter, but there is (that I can think of) one instance on stable where pattern evaluation order matters: matching on a struct with a tag and a union field (and similar situations).

The example in the section on struct patterns explicitly states that field order does not matter, but it also does not have any patterns where the evaluation order of subpatterns would matter.

```rs match s { Point {x: 10, y: 20} => (), Point {y: 10, x: 20} => (), // order doesn't matter Point {x: 10, ..} => (), Point {..} => (), } ```

The section on unions does mention pattern matching, but does not say anything about pattern evaluation order. It gives an example of pattern-matching on a manual tagged union, though pattern evaluation order does not matter for the example given[^2]. In a slightly different example, however, the field order does matter:

the example

```rs #[derive(Clone, Copy)] enum Tag { A, B, } #[derive(Clone, Copy)] #[repr(C)] union Value { a: u32, b: u8, // note that b is smaller than a } /// Assume that if tag == Tag::A, then val.a is valid, and if tag == Tag::B, then tag.b is valid. #[derive(Clone, Copy)] struct Tagged { tag: Tag, val: Value, } unsafe fn tag_first(v: Tagged) -> bool { match v { // fine under miri with tag == B, sees that `tag != A` and skips the arm Tagged { tag: Tag::A, val: Value { a: 0 } } => true, _ => false, } } unsafe fn val_first(v: Tagged) -> bool { match v { // error under miri with tag == B, since it reads the padding bytes after `Value::b` Tagged { val: Value { a: 0 }, tag: Tag::A } => true, _ => false, } } fn main() { let v = Tagged { tag: Tag::B, val: Value { b: 0 }, }; unsafe { tag_first(v); val_first(v); } } ```

For unstable code, I suppose deref_patterns might also make it important to document pattern evaluation order, or maybe that feature is/will be restricted enough for it not to matter. Depending on the resolution of https://github.com/rust-lang/unsafe-code-guidelines/issues/412 pattern evaluation order might be important if matching on references-to-references-to-invalid-data (miri example)?

I'm not sure if this is fully the intended behavior[^1], or if it is intended, how best to document it.

[^1]: Alternately, instead of documenting pattern evaluation order, it could be specified that if any (union) field used in a pattern match is invalid/uninitialized, then the whole arm is UB, regardless of the order the fields were written in the pattern.

[^2]: in that example, the union field is fully-initailized either way, or UB happens regardless of pattern evaluation order

workingjubilee commented 2 weeks ago

cc @Nadrieril I suspect this question can by answered by the ~~curse~~ blessing of your knowledge.

Nadrieril commented 2 weeks ago

Hehe let me propagate the curse. Fun fact, this trips Miri (because we sort or-patterns last for perf reasons):

unsafe fn tag_first(v: Tagged) -> bool {
    match v {
        Tagged { tag: Tag::A | Tag::A, val: Value { a: 0 } } => true,
        _ => false,
    }
}

and this doesn't (because once we match a variant, we add its subpatterns at the end of the current list of things-to-test):

#[derive(Clone, Copy)]
struct Tagged {
    tag: Tag,
    val: Option<Value>,
}

unsafe fn val_first(v: Tagged) -> bool {
    match v {
        Tagged {
            val: Some(Value { a: 0 }),
            tag: Tag::A,
        } => true,
        _ => false,
    }
}

Today's implementation is rougly top-to-bottom & left-to-right for simple patterns, except nested patterns make the whole thing complicated, and we establish bindings only after the whole arm successfully matches, and we often test the same place many times in a way that's tricky to specify, and we sort or-patterns after other patterns for efficiency.

In short: current status would be a mess to specify. As you correctly point out, deref patterns make that more observable i.e. worse.

My understanding of the current guarantees is: there are none. We don't even specify what memory a pattern accesses. I would like that to change.

Here would be my starting point:

a pattern can be decomposed into a list of tests and a list of bindings. each test/binding applies to a specific place. a test involves a SwitchInt on the place or its discriminant (depending on the type). E.g. Some(Value { a: 0 }) has a test on the discriminant at place scrutinee and a test on the value at place (scrutinee as Some).0.a.
in the execution of a match expression (other matching constructs work the same), any and all of the tests in the patterns may be executed in any order.
the one constraint on order is that tests on a place inside an enum variant will only be executed if the corresponding discriminant test succeeded (e.g. a test at place (scrutinee as Some).0 requires that a test on the discriminant of scrutinee returned Some).
an arm is said to match if all its tests were successful. we guarantee to execute the first arm that matches. once the first arm matches, we establish its bindings in any order (by copying or referencing the relevant place for each binding), then execute the code of the arm. if no arm matches, this is UB.
this ignores arm guards: they also have non-obvious guarantees on the order and number of times each guard is executed.

For your case about unions, this is the maximally-UB option.

From this starting point:

we could specify the order of bindings, e.g. that a @ (b, Some(c)) establishes the bindings in left-to-right order. That sounds feasible even though I've tried and don't yet know how to implement that well when or-patterns are involved.
to get tagged unions, I think we can say "if a pattern contains tests at places inside an union, these tests are guaranteed to happen after any other tests that don't involve unions in that same pattern" (while respecting the enum ordering constraint above ofc). This isn't enough if we want nested tagged unions though, not sure what would be a good ordering for that.
forcing tests to go top-to-bottom & left-to-right (as can be observed from UB, i.e. assuming that two reads are as UB as one) sounds implementable, but I'm worried it would leave performance gains on the table. Deref patterns with user-provided types make the order even more observable, and in that case top-to-bottom & left-to-right is not something we can reasonably guarantee.

(thanks for raising this btw, it'll be good to get this ironed out)

RalfJung commented 2 weeks ago

Cc @rust-lang/opsem

RalfJung commented 2 weeks ago

Deref patterns with user-provided types make the order even more observable, and in that case top-to-bottom & left-to-right is not something we can reasonably guarantee.

I would have said in that case top-to-bottom & left-to-right is not something we have a reasonable alternative to. Rust has systematically avoided unspecified or non-deterministic evaluation order as those are a terrible footgun in C/C++. So I would argue that we should specify top-to-bottom & left-to-right, and it is up to the pattern compilation engine to detect cases where the order is not observable and then exploit that for performance.

rust-lang / reference

Document pattern evaluation order #1665