Open zachs18 opened 2 weeks ago
cc @Nadrieril I suspect this question can by answered by the curse blessing of your knowledge.
Hehe let me propagate the curse. Fun fact, this trips Miri (because we sort or-patterns last for perf reasons):
unsafe fn tag_first(v: Tagged) -> bool {
match v {
Tagged { tag: Tag::A | Tag::A, val: Value { a: 0 } } => true,
_ => false,
}
}
and this doesn't (because once we match a variant, we add its subpatterns at the end of the current list of things-to-test):
#[derive(Clone, Copy)]
struct Tagged {
tag: Tag,
val: Option<Value>,
}
unsafe fn val_first(v: Tagged) -> bool {
match v {
Tagged {
val: Some(Value { a: 0 }),
tag: Tag::A,
} => true,
_ => false,
}
}
Today's implementation is rougly top-to-bottom & left-to-right for simple patterns, except nested patterns make the whole thing complicated, and we establish bindings only after the whole arm successfully matches, and we often test the same place many times in a way that's tricky to specify, and we sort or-patterns after other patterns for efficiency.
In short: current status would be a mess to specify. As you correctly point out, deref patterns make that more observable i.e. worse.
My understanding of the current guarantees is: there are none. We don't even specify what memory a pattern accesses. I would like that to change.
Here would be my starting point:
SwitchInt
on the place or its discriminant (depending on the type). E.g. Some(Value { a: 0 })
has a test on the discriminant at place scrutinee
and a test on the value at place (scrutinee as Some).0.a
.match
expression (other matching constructs work the same), any and all of the tests in the patterns may be executed in any order.(scrutinee as Some).0
requires that a test on the discriminant of scrutinee
returned Some
).For your case about unions, this is the maximally-UB option.
From this starting point:
a @ (b, Some(c))
establishes the bindings in left-to-right order. That sounds feasible even though I've tried and don't yet know how to implement that well when or-patterns are involved.(thanks for raising this btw, it'll be good to get this ironed out)
Cc @rust-lang/opsem
Deref patterns with user-provided types make the order even more observable, and in that case top-to-bottom & left-to-right is not something we can reasonably guarantee.
I would have said in that case top-to-bottom & left-to-right is not something we have a reasonable alternative to. Rust has systematically avoided unspecified or non-deterministic evaluation order as those are a terrible footgun in C/C++. So I would argue that we should specify top-to-bottom & left-to-right, and it is up to the pattern compilation engine to detect cases where the order is not observable and then exploit that for performance.
For most patterns and in safe code, "evaluation"(/matching?) order of subpatterns does not matter, but there is (that I can think of) one instance on stable where pattern evaluation order matters: matching on a struct with a tag and a
union
field (and similar situations).The example in the section on struct patterns explicitly states that field order does not matter, but it also does not have any patterns where the evaluation order of subpatterns would matter.
The section on
union
s does mention pattern matching, but does not say anything about pattern evaluation order. It gives an example of pattern-matching on a manual tagged union, though pattern evaluation order does not matter for the example given[^2]. In a slightly different example, however, the field order does matter:the example
```rs #[derive(Clone, Copy)] enum Tag { A, B, } #[derive(Clone, Copy)] #[repr(C)] union Value { a: u32, b: u8, // note that b is smaller than a } /// Assume that if tag == Tag::A, then val.a is valid, and if tag == Tag::B, then tag.b is valid. #[derive(Clone, Copy)] struct Tagged { tag: Tag, val: Value, } unsafe fn tag_first(v: Tagged) -> bool { match v { // fine under miri with tag == B, sees that `tag != A` and skips the arm Tagged { tag: Tag::A, val: Value { a: 0 } } => true, _ => false, } } unsafe fn val_first(v: Tagged) -> bool { match v { // error under miri with tag == B, since it reads the padding bytes after `Value::b` Tagged { val: Value { a: 0 }, tag: Tag::A } => true, _ => false, } } fn main() { let v = Tagged { tag: Tag::B, val: Value { b: 0 }, }; unsafe { tag_first(v); val_first(v); } } ```For unstable code, I suppose
deref_patterns
might also make it important to document pattern evaluation order, or maybe that feature is/will be restricted enough for it not to matter. Depending on the resolution of https://github.com/rust-lang/unsafe-code-guidelines/issues/412 pattern evaluation order might be important if matching on references-to-references-to-invalid-data (miri example)?I'm not sure if this is fully the intended behavior[^1], or if it is intended, how best to document it.
[^1]: Alternately, instead of documenting pattern evaluation order, it could be specified that if any (union) field used in a pattern match is invalid/uninitialized, then the whole arm is UB, regardless of the order the fields were written in the pattern.
[^2]: in that example, the union field is fully-initailized either way, or UB happens regardless of pattern evaluation order