rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.44k stars 12.73k forks source link

Tracking issue for RFC 2535, 2530, 2175, "Or patterns, i.e `Foo(Bar(x) | Baz(x))`" #54883

Closed Centril closed 3 years ago

Centril commented 6 years ago

This is a tracking issue for the RFC "Or patterns, i.e Foo(Bar(x) | Baz(x))" (rust-lang/rfcs#2535).

This issue also tracks the changes in rust-lang/rfcs#2175 and rust-lang/rfcs#2530 since RFC 2535 subsume those.

Status:

Steps:

Unresolved questions:

The benefit of avoiding pat<no_top_alt> in as many places as possible would both be grammatical consistency and fewer surprises for uses. The drawbacks would be possible ambiguity or backtracking for closures and breakage for macros.

Implementation history:

Centril commented 6 years ago

I've closed https://github.com/rust-lang/rust/issues/48215 in favor of this issue to track all related changes since https://github.com/rust-lang/rfcs/pull/2535 subsumes RFC 2175.

I would suggest that #![feature(or_patterns)] should imply #![feature(if_while_or_patterns)] and then we will eventually stabilize both gates if/once we stabilize or_patterns.

CAD97 commented 6 years ago

Minor thing that I just noticed: anonymous enums with the syntax of "tuple-but-|-instead-of-,", (A|B), is syntactically ambiguous with this feature, since we have parenthesized patterns.

Centril commented 6 years ago

@CAD97 example of a snippet that would show the ambiguity?

CAD97 commented 6 years ago

I confused things between patterns and types:

match it {
    it: (A|B) => (), // it with type ascription (A|B)
    it @ (A|B) => (), // it binding to pattern (A|B)
}

So long as patterns and types aren't valid in the same position, this is ok. (I confused myself since tuples are valid patterns.)

varkor commented 5 years ago

I've got a branch for this that I think is close, but as I'm quite busy at the moment, I'm not sure how soon I'll finish it; if anyone else would like to take this over in the meantime, just leave a comment.

alexreg commented 5 years ago

@varkor Where's your branch? I'm happy to take over if you like... especially if it gets const generics to land any sooner. ;-)

varkor commented 5 years ago

@alexreg: it's at https://github.com/varkor/rust/tree/or-patterns. I think where I left it, there was some crash when you tried using or-patterns, but most of the boilerplate is in. Feel free to play around with it to see if you can get it working. (Though I'm not prioritising this over const generics, don't worry 😉)

alexreg commented 5 years ago

@dlrobertson BTW, have you taken into account the unresolved questions whilst working on this?

dlrobertson commented 5 years ago

@alexreg thanks for the ping. Haven't really thought about it too much. I've mostly been experimenting with the code, but at this point I am not keen on changing more than the pattern matching code for the following reasons:

That being said, I'm still new, so I might be missing something.

Centril commented 5 years ago

@dlrobertson Thanks for working on this. The more important part to test with crater is whether top_pat can be used for pat macro fragments or if it has regressions so it would be good to start with top_pat and maybe switch to pat<no_top_alt> based on the data.

I think it's also worth considering fn foo(Ok(x) | Err(x): Result<A, B>) { .. } without parens since that doesn't look ambiguous like with closures (the part about type ascription is also irrelevant since both p and q must have the same type in p | q so it shouldn't matter what the ascription applies to). But we can leave that for later.

alexreg commented 5 years ago

@dlrobertson That's fair enough. What @Centril said is good advice.

dlrobertson commented 5 years ago

Back to looking at this. An update for anyone following this (and a note to self so I don't forget things).

TODO:

dlrobertson commented 5 years ago

I have or-patterns in a usable state on a local branch. I'm sure there are bugs, but the examples I have work. The change is much larger than I expected. As a result, I plan to submit support for a subset of the syntax proposed in the RFC over 4 PRs:

After these are submitted and merged I will continue to work on implementing the full syntax proposed in the RFC and fix bugs found along the way :smile:

Centril commented 5 years ago

Basic parsing PR https://github.com/rust-lang/rust/pull/61708 landed on 2019-08-18. It was written by @dlrobertson and reviewed by @varkor, @Centril, @petrochenkov, @matthewjasper, and @alexreg.

jamesmunns commented 5 years ago

Would implementation of this RFC also apply to patterns in if conditionals?

For example, could I do something like this?

if some_str == "this" | "that" | "whatever" {
// ...
}

EDIT: Reading the guide level section of the RFC, I would suppose not, as the contents of an if conditional is not really a pattern in the same way that it is in if let or match, and I suppose today you can already do something like this with match:

match some_str {
    "this" | "that" | "whatever" => {},
    _ => {}
}
Centril commented 5 years ago

@jamesmunns Or with this if let:

fn discovered_universal_model_of_computation(name: &str) {
    if let "Alan" | "Alonzo" | "Kurt" = name {
        println!("Yes, {} did discover a turing equivalent model of computation.", name);
    } else {
        println!("Nope, but {} was pretty cool nonetheless.", name);
    }
}

fn main() {
    for name in &["Alan", "Alonzo", "Kurt", "Voevodsky"] {
        discovered_universal_model_of_computation(name);
    }
}
Centril commented 5 years ago
Centril commented 5 years ago

In https://github.com/rust-lang/rust/pull/64128, written by @Centril and reviewed by @davidtwco, issue https://github.com/rust-lang/rust/issues/64106 was fixed where the compiler would spuriously emit unused_parens warnings on:

Centril commented 5 years ago

In https://github.com/rust-lang/rust/pull/64111, written by @Centril and reviewed by @petrochenkov, the late-resolution behavior of or-patterns in nested positions was fixed as was the already-bound check and binding-consistency check. Moreover, the AST now uniformly uses ast::PatKind::Or and has no special means of representing or-patterns at the top level.

Nadrieril commented 5 years ago

I'd be interested in implementing the exhaustiveness-checking part of this feature, since I've become familiar with the exhaustiveness algorithm

Centril commented 5 years ago

@Nadrieril Take a look at https://github.com/rust-lang/rust/pull/63688/files#diff-11dbfaea8934544f878f807e6285a221 -- it may be that @dlrobertson has already solved the issue there, but we could expedite this part. Also, we should include some tests for some non-exhaustive cases (the exhaustive cases may need further work in MIR to not ICE). It might be best to fix this once https://github.com/rust-lang/rust/pull/64508 has landed tho.

dlrobertson commented 5 years ago

@Nadrieril great to see that you're interested in helping out! Due to the shift in the style of implementation ("expanding" the or-pattern at the Candidate level instead of manually manipulating the CFG) the exhaustiveness changes will be made in the MIR generation PR. That being said, as @Centril noted tests etc will definitely be needed/appreciated.

Centril commented 5 years ago

On 2019-09-25, https://github.com/rust-lang/rust/pull/64508, written by @Centril and reviewed by @matthewjasper, @varkor, and @nikomatsakis, landed. The PR adjusted the HIR and HAIR to consistently use or-patterns at the top & nested levels. Moreover, liveness, typeck, and dead_code analyses were adjusted also. Notably, The PR did however not adjust exhaustiveness checking. Tangentially, https://github.com/rust-lang/rust/pull/64271, landed 2019-09-12, written by @Centril and reviewed by @estebank, adjusted check_match diagnostics logic to be better prepared for or-patterns.

Centril commented 4 years ago

Updates since last time:

Centril commented 4 years ago

On 2019-12-03, #66967, written by @Nadrieril and reviewed by @varkor and @Centril, landed. In this follow-up to #66612, the top-level hack for or-patterns in exhaustiveness checking was removed, making or-patterns truly first-class in match checking.

Centril commented 4 years ago

On 2019-12-21, https://github.com/rust-lang/rust/pull/67428, written by @Centril and reviewed by @matthewjasper, landed. This PR adjusted regionck's resolve_local::is_binding_pat such that the P& grammar includes or-patterns.

Centril commented 4 years ago

On 2019-12-22, https://github.com/rust-lang/rust/pull/67439, written by @Centril and reviewed by @matthewjasper, landed. Among other things, this PR simplified HAIR lowering of or-patterns a smidgen.

Centril commented 4 years ago

On 2019-12-26, https://github.com/rust-lang/rust/pull/67592, written by @matthewjasper and reviewed by @Centril, landed. This PR did some preparatory cleanups in MIR lowering of match expressions before the MIR level support for or-patterns is added. Moreover, the PR gated or-patterns in const contexts under const_if_match.

Centril commented 4 years ago

What remains now is primarily hardening of tests, fixing smaller bugs where they show up, as well as dogfooding the implementation in the compiler and standard library.

Centril commented 4 years ago
Centril commented 4 years ago
Centril commented 4 years ago
Centril commented 4 years ago
Centril commented 4 years ago
mark-i-m commented 4 years ago

Is there anything blocking a subset of this from being stabilized? It seems like the most basic form of this feature is working and used throughout the compiler and std.

varkor commented 4 years ago

The most pressing blocking issue is the remaining ICE: https://github.com/rust-lang/rust/issues/72680, though there appear to be a couple of outstanding design questions. I would have thought it would be close to ready for stabilisation apart from that.

mark-i-m commented 4 years ago

@varkor I've opened a PR for #72680, and I'm wondering if the design questions can be resolved after stabilization. It seems like both are backwards-compatible changes we can do later, right?

EDIT: I mean #78856

varkor commented 4 years ago

The former design question (about allowing | in closure arguments) does not need to be resolved immediately (alternatively, I think it should just be resolved in the negative now, because allowing them would be very confusing). But the latter seems like a backwards-incompatible change, if macro behaviour depends on it? But it seems like this was already considered for an edition change instead of being implemented immediately, anyway, so perhaps it can be deferred: the potential for increasing ambiguity for macros doesn't sound promising.

mark-i-m commented 4 years ago

Personally, I'm inclined to wait for it to be a problem and address it in a future edition if needed. What would the process be for resolving the question? An MCP? An FCP? Or should I just open a stabilization PR?

varkor commented 4 years ago

One would need to make sure that we have extensive tests for the feature, and also that sufficient documentation exists. Then you could open up a stabilisation PR, and the lang team could FCP on the PR.

nikomatsakis commented 4 years ago

I'll nominate this for the @rust-lang/lang meeting, also to discuss the two unresolved questions. I think I agree with @varkor about the first one (closure arguments) -- I would definitely find such notation quite confusing. The second one is less clear to me (modifications to macro pattern matching), but it is almost certainly backwards incompatible. We've definitely been bitten by changes like that in the past breaking macros in the wild.

joshtriplett commented 4 years ago

:+1: for not allowing this notation in closure arguments.

joshtriplett commented 4 years ago

We discussed this in the 2020-11-10 @rust-lang/lang meeting. Our conclusion:

mark-i-m commented 4 years ago

I created https://github.com/rust-lang/rust/pull/78935 for a crater run.

nikomatsakis commented 3 years ago

Removing nomination and requesting that @mark-i-m you re-nominate once crater data is available. =)

mark-i-m commented 3 years ago

So a crater run is queued. I expect it will be ~1 week before we see results... In the meantime, I thought I would write up a quick summary of the remaining unresolved question:

Should the pat macro fragment specifier match top_pat in different Rust editions or should it match pat as currently specified? We defer such decisions to stabilization because it depends on the outcome of crater runs to see what the extent of the breakage would be.

There seem to be 3 options:

  1. Change :pat to match top_pat. AFAICT, this is the option of maximum breakage. We would be adding | to the FIRST set of a pattern (leading vert).
    • As joshtriplett says above, if people are using | merely to emulate or-patterns in the most obvious way $($p:pat)|+, then there should be no breakage, since the first repetition of the matcher would just match the whole input. However, if they use | in other ways, their macro would break.
    • This option is currently implemented by https://github.com/rust-lang/rust/pull/78935 and is queued for a crater run.
    • This is fairly simple implementation-wise.
    • This option is the most intuitive to users IMHO.
  2. Change :pat to match top_pat, but fall back to the current behavior when :pat is followed by | in a matcher (but not as a repetition separator). For example, $p:pat | $x:ident would use the fallback path, but $($p:pat)|+ would still use the new path.
    • This option was proposed by petrochenkov in https://github.com/rust-lang/rust/pull/78935#issuecomment-729876500.
    • I think it would eliminate the breakage.
    • It could make the MBE code more complex than it already is. I would like to avoid it if possible.
    • It's not clear to me if there are weird/confusing edge cases that might be confusing to users. For example, matching Some(x) | None | against $(p:pat)|+ | might parse or it might fail depending on whether the main Rust parser (used for parsing non-terminals) realizes that the pattern ends at None (I think currently it would fail). We could remove | from the FOLLOW set of :pat, but then we have breakage again, so that defeats the point of this whole option.
  3. Change :pat to match pat<no_top_alt>. Basically, this option requires that top-level or-patterns must be surrounded by parens. For example, $p:pat would match (Some(x) | None) but it would not match Some(x) | None.

    • This is the most conservative option, and I believe there would be no breakage here.
    • It might be confusing to users who are expecting :pat to match anything that could be a pattern in other parts of Rust. For example, the following would fail:
      
      macro_rules! mymatch {
      ($p:pat) => {{
      match bar {
        $p => { 0 }
        _ => { 1 }
      }
      }}
      }

    mymatch! { | Some(3) | None // Parser error! No rules expected |. Wat? }

    mymatch! { (Some(3) | None) } // But this parses?

    
    * I think this option is forward-compatible with (1), even though (1) is not backwards-compatible with (3)....

My personal opinion is that if the breakage is small, we should go for option (1). Otherwise, we should go for option (3) unless someone has really strong feelings about why we should have top-level or-patterns in :pat.

joshtriplett commented 3 years ago

On Sat, Nov 21, 2020 at 11:23:28AM -0800, Who? Me?! wrote:

So a crater run is queued. I expect it will be ~1 week before we see results... In the meantime, I thought I would write up a quick summary of the remaining unresolved question:

Should the pat macro fragment specifier match top_pat in different Rust editions or should it match pat as currently specified? We defer such decisions to stabilization because it depends on the outcome of crater runs to see what the extent of the breakage would be.

There seem to be 3 options:

  1. Change :pat to match top_pat. AFAICT, this is the option of maximum breakage. We would be adding | to the FIRST set of a pattern (leading vert).
    • As joshtriplett says above, if people are using | merely to emulate or-patterns in the most obvious way $($p:pat)|+, then there should be no breakage, since the first repetition of the matcher would just match the whole input. However, if they use | in other ways, their macro would break.
    • This option is currently implemented by https://github.com/rust-lang/rust/pull/78935 and is queued for a crater run.
    • This is fairly simple implementation-wise.
    • This option is the most intuitive to users IMHO.
  2. Change :pat to match top_pat, but fall back to the current behavior when :pat is followed by | in a matcher (but not as a repetition separator). For example, $p:pat | $x:ident would use the fallback path, but $($p:pat)|+ would still use the new path.
    • This option was proposed by petrochenkov in https://github.com/rust-lang/rust/pull/78935#issuecomment-729876500.
    • I think it would eliminate the breakage.
    • It could make the MBE code more complex than it already is. I would like to avoid it if possible.
    • It's not clear to me if there are weird/confusing edge cases that might be confusing to users. For example, matching Some(x) | None | against $(p:pat)|+ | might parse or it might fail depending on whether the main Rust parser (used for parsing non-terminals) realizes that the pattern ends at None (I think currently it would fail). We could remove | from the FOLLOW set of :pat, but then we have breakage again, so that defeats the point of this whole option.
  3. Change :pat to match pat<no_top_alt>. Basically, this option requires that top-level or-patterns must be surrounded by parens. For example, $p:pat would match (Some(x) | None) but it would not match Some(x) | None.

    • This is the most conservative option, and I believe there would be no breakage here.
    • It might be confusing to users who are expecting :pat to match anything that could be a pattern in other parts of Rust. For example, the following would fail:
      
      macro_rules! mymatch {
      ($p:pat) => {{
      match bar {
        $p => { 0 }
        _ => { 1 }
      }
      }}
      }

    mymatch! { | Some(3) | None // Parser error! No rules expected |. Wat? }

    mymatch! { (Some(3) | None) } // But this parses?

My personal opinion is that if the breakage is small, we should go for option (1). Otherwise, we should go for option (3) unless someone has really strong feelings about why we should have top-level or-patterns in :pat.

One more option would be to go for option 1 if we can do that transition smoothly, and if not, go for option 1 in Rust 2021 and option 3 in Rust 2018/2015.

mark-i-m commented 3 years ago

The crater run is complete: https://github.com/rust-lang/rust/pull/78935#issuecomment-735050394

10993 regressed and 5 fixed (131254 total)

I don't actually know how to read the report, though. IIUC, there are 43 broken crates that are dependencies of all the others, but I don't know how to go deeper.

joshtriplett commented 3 years ago

Reading the regressions gives me an idea:

How hard would it be (in 2018/2015 only) to check if the next token after the :pat matcher is a |, and in that case only, interpret pat with no_top_alt?

mark-i-m commented 3 years ago

@joshtriplett Yes, that's similar to @petrochenkov 's suggestion (2 above). Let me try implementing it and see.