`is` operator for pattern-matching and binding

joshtriplett commented 4 months ago

Introduce an is operator in Rust 2024, to test if an expression matches a pattern and bind the variables in the pattern. This is in addition to let-chaining; this RFC proposes that we allow both let-chaining and the is operator.

Previous discussions around let-chains have treated the is operator as an alternative on the basis that they serve similar functions, rather than proposing that they can and should coexist. This RFC proposes that we allow let-chaining and add the is operator.

The is operator allows developers to chain multiple matching-and-binding operations and simplify what would otherwise require complex nested conditionals. The is operator allows writing and reading a pattern match from left-to-right, which reads more naturally in many circumstances. For instance, consider an expression like x is Some(y) && y > 5; that boolean expression reads more naturally from left-to-right than let Some(y) = x && y > 5.

This is even more true at the end of a longer expression chain, such as x.method()?.another_method().await? is Some(y). Rust method chaining and ? and .await all encourage writing code that reads in operation order from left to right, and is fits naturally at the end of such a sequence.

Having an is operator would also help to reduce the demand for methods on types such as Option and Result (e.g. Option::is_some_and and Result::is_ok_and and Result::is_err_and), by allowing prospective users of those methods to write a natural-looking condition using is instead.

Rendered

joshtriplett commented 4 months ago

Nominating because this is making a proposal for the 2024 edition.

fbstj commented 4 months ago

I see there is no mention of pattern types though it seems they would be similar but distinct use of is as an operator?

is this a pre-requisite of pattern types (to get the keyword in the language?) or does it conflict with the types usage?

programmerjake commented 4 months ago

when combined with pattern types, what way does the precedence go? so, does v as i32 is 5 parse as (v as i32) is 5 or v as (i32 is 5)? or is it ambiguous and errors, requiring parenthesis?

joshtriplett commented 4 months ago

@fbstj wrote:

I see there is no mention of pattern types though it seems they would be similar but distinct use of is as an operator?

is this a pre-requisite of pattern types (to get the keyword in the language?) or does it conflict with the types usage?

This is not related to pattern types. I believe we can do both without conflict. I added some text to the "unresolved questions" section to confirm that we can do both without conflicts.

@programmerjake wrote:

when combined with pattern types, what way does the precedence go? so, does v as i32 is 5 parse as (v as i32) is 5 or v as (i32 is 5)? or is it ambiguous and errors, requiring parenthesis?

I've added some text to the RFC, stating that this should require parentheses (assuming pattern types work with as).

dev-ardi commented 4 months ago

What patterns does is enable that aren't covererd by matches!?

joshtriplett commented 4 months ago

@dev-ardi One example:

if expr is Some(x) && x > 3 {
    println!("value is {x}");
}

Veykril commented 4 months ago

I find it a bit odd that we would want both is expressions and let chains. They serve exactly the same purpose, the only difference being their reading order. I can understand the argument that we would want to have let chains due to people expecting them to work given we already have if let and the like but this feels like the wrong way to address that. I would instead expect us to deprecate if let and while let in favor of is and dropping let chains.

I feel like that should be added to the alternatives and/or pad out the feature duplication drawbacks paragraph.

joshtriplett commented 4 months ago

@Veykril wrote:

I would instead expect us to deprecate if let and while let in favor of is

That would be a massive amount of churn for very little benefit.

Nonetheless, you're right that I should add this to the alternatives section.

flip1995 commented 4 months ago

Adding multiple ways to do the same thing also makes teaching Rust harder: let in Rust is everywhere: if let, while let, let-chains, let ... else, ... So you have to teach pattern matching with let anyway. Meaning, this "right-to-left" reading order will become natural to Rust users quick. By introducing a different way, while easy and intuitive to understand, won't help much in code clarity IMO, as people are already used to reading let patterns.

burdges commented 4 months ago

I'd epxect is to be a pretty common variable name, so maybe worth exploring less common words, like Some(y) binds x && y > 5 or x matches Some(y) && y > 5.

I do think larger expression make the left vs right swap interesting, but remember perl created chaos with its left vs right trickery, so one should really be careful here. matches maybe works both ways.

Yes both let Some(y) = x && y > 5 and let .. else become extremely confusing, but humans could parse some sensibly bracketed flavors, like { let super Some(x) = foo } && y > 5 ala https://blog.m-ou.se/super-let/

VitWW commented 4 months ago

If we add is as a keyword, we should also reserve isnot as a keyword for future NOT-patterns

if expr isnot Some(x) {
    println!("error");
}

Edited: I'm sorry for some impoliteness with "must"

VitWW commented 4 months ago

Author mentioned just one alternative name for is: ~. But I think we should add another alternative names in RFC, like equal or identic:

if expr identic Some(x) && x > 3 {
    println!("value = {x}");
}

workingjubilee commented 4 months ago

@VitWW I don't think so, Vit.

workingjubilee commented 4 months ago

@VitWW

If we add is as a keyword, we must also reserve isnot as a keyword for future NOT-patterns

You speak in a commanding way ("we must"), without justification. So, having considered my own thoughts: ...I disagree! Please offer a justification for your reasoning, and especially, why it should be addressed now, not "we do this for future expansion opportunities". It seems it will simply run up against all the concerns we're already facing, and we can wait until then.

Author mentioned just one alternative name for is: ~. But I think we should add another alternative names in RFC, like equal or identic:

To say we should do something is better than to command, but I don't think you have explained the prior art or other reasoning why it must be addressed in the RFC. Perhaps you were building off the point that burdges made? But unfortunately equal is also a common function name in Rust, used in e.g. polars (as public API) and the stdlib (as private), and also seems to be a reasonably popular variable name. So it at least doesn't feel obvious as to why we would go with that.

For everyone else suggesting alternative keywords, I do really recommend everyone at least check using grep.app or something similar if their recommendation is in Rust public API somewhere, and how many cases, and be forthcoming on how many examples they find. You will likely pull hundreds of pages, so you may wish to do extrapolation or more exact queries using other tools after downloading the crates.io index.

Of course, we do have our system of keyword reservation, the k# and r# stropping, and edition-sensitive keyword parsing, so I think this is not the only thing to consider, and we can in fact simply pick the nicest-looking syntax if it doesn't seem an overwhelming problem. But it is best if we keep in mind any induced complexity in the lexer and parser, and the community reaction, while we rummage through our collection of Pantone chips for this shed.

workingjubilee commented 4 months ago

@joshtriplett While I think is, er, is a fine choice, I wish to (gently) refute ~ as lacking a history as a "pattern-matching operator", and provide some background that might be worth at least reviewing. First, SQL does have bare ~ but I think it is reasonable to mostly omit considering SQL's language features, as it is deliberately unlike most other PLs for reasons beyond this discussion. However, ~= and =~ do have prominent histories as a pattern-matching operator!

Swift does use ~= as a pattern-matching operator, and even uses it as part of case evaluation: https://developer.apple.com/documentation/swift/range/~=(_:_:)
Ruby offers the inverse, =~, for a regex-centric pattern-matching: https://ruby-doc.org/core-2.6.3/Regexp.html#class-Regexp-label-3D~+and+Regexp-23match
And part of why it does so is because Bash does it: https://www.gnu.org/software/bash/manual/html_node/Conditional-Constructs.html#index-_005b_005b

Obviously, the regexp-centric examples don't exactly match to the Rust pattern language, but it's clearly a popular choice if three exceptionally common procedural PLs use it. Other examples like Vimscript and PromQL also use them, but obviously that gets increasingly niche. Wiktionary even asserts ~= is used in mathematics... but also mentions~= is also used as an equivalent to Rust's !=, e.g. Lua and MATLAB.

It seems to me when ~ is included in an operator's symbol, either it means that negation, or it does imply something akin to saying "roughly like...", an approximate match, which may be why Dart uses ~/ for divide-to-integer (as opposed to dividing to a double, which more accurately represents the result of 3 / 2). Of course, that very page I just cited also mentions Dart has is, so I only consider this to be interesting context!

davidhewitt commented 4 months ago

Some reactions I had while walking and thinking on this earlier:

I like is for legibility and I think it will probably read nicer than let chains in almost all cases
Python has is operator as object identity which is almost only used for x is None, which the operator here would support. A possible addition to the prior art.
I strongly agree with the concern that's been repeated a few times here that we already have forms like if let and also let - else, and the distinction here is currently proposed to just be a style choice.

Especially as the recent language survey seemed to highlight language bloat as one of the largest risks to the language, having this purely be stylistic seems to be in direct opposition to the data.

If we were to move forward with this I'd hope that this RFC takes a stronger stance on when to use let forms and when to use is forms, and strongly considers the deprecation alternative.

davidhewitt commented 4 months ago

Possible observation: by allowing expr is PAT && condition here, users may be more likely to try PAT && condition as match arms instead of the current PAT if condition. We may want to allow that:

match color {
    (RGB(r, g, b) | RGBA(r, g, b, _)) && r == b && g < 1 => /* ... */,
                                      ^^ - this is currently a compile error, should be `if`
    _ => /* ... */
}

... I think it'd ease refactoring and papercuts when converting code between x is PAT && y { ... } to match x { PAT && y => ... }

clarfonthey commented 4 months ago

While I think is, er, is a fine choice, I wish to (gently) refute ~ as lacking a history as a "pattern-matching operator", and provide some background that might be worth at least reviewing. First, SQL does have bare ~ but I think it is reasonable to mostly omit considering SQL's language features, as it is deliberately unlike most other PLs for reasons beyond this discussion. However, ~= and =~ do have prominent histories as a pattern-matching operator!
* Swift does use `~=` as a pattern-matching operator, and even uses it as part of `case` evaluation: https://developer.apple.com/documentation/swift/range/~=(_:_:)

* Ruby offers the inverse, `=~`, for a regex-centric pattern-matching: https://ruby-doc.org/core-2.6.3/Regexp.html#class-Regexp-label-3D~+and+Regexp-23match

* And part of why it does so is because Bash does it: https://www.gnu.org/software/bash/manual/html_node/Conditional-Constructs.html#index-_005b_005b
Obviously, the regexp-centric examples don't exactly match to the Rust pattern language, but it's clearly a popular choice if three exceptionally common procedural PLs use it. Other examples like Vimscript and PromQL also use them, but obviously that gets increasingly niche. Wiktionary even asserts ~= is used in mathematics... but also mentions~= is also used as an equivalent to Rust's !=, e.g. Lua and MATLAB.

It seems to me when ~ is included in an operator's symbol, either it means that negation, or it does imply something akin to saying "roughly like...", an approximate match, which may be why Dart uses ~/ for divide-to-integer (as opposed to dividing to a double, which more accurately represents the result of 3 / 2). Of course, that very page I just cited also mentions Dart has is, so I only consider this to be interesting context!

Just to follow up on this a bit, particularly from a mathematical perspective. Yes, you're right that ~ has some similarity to ≈, which means "approximately equal to," and thus it makes sense as a pattern-matching operator.

However, ~= and =~, from a programming perspective, are far too loaded to really work well as that kind of operator. Like, I've been writing a lot of Lua lately and ~= is just straight-up != in Lua.

Plus, with the way Rust tends to organise its operators, the existence of ~= implies that there should be a standalone ~, which wouldn't be the case here. So, I would advocate against that regardless.

Drawing to the bigger point of what this operator should be: I genuinely don't think that there's something better than is. It's two characters, which is as long as many existing operators. People say that it's a common variable name, but I think that it's only common as a pluralisation of i, where i_s could easily serve that purpose. And the only other reasonable alternative that I can think of is ~, which is shorter and less clear. Any other keywords are going to be longer, more awkward, and more likely to cause name conflicts.

I point out some of the alternatives in the RFC because I think that we should definitely include the best arguments in favour of is in the RFC, but I genuinely do think that it's the best choice.

programmerjake commented 4 months ago

If we add is as a keyword, we must also reserve isnot as a keyword for future NOT-patterns

I disagree, IMO not patterns can be written as !Some(_) (!-patterns can be used everywhere a fallible pattern is (match, if let, is, let ... else), so isn't an is! operator). This means there's two ways to write it, with the not operator: !(a is Some(v)) || v == 0 or with a not pattern: a is !Some(v) || v == 0 or a is (!Some(_) | Some(0))

VitWW commented 4 months ago

You speak in a commanding way ("we must"), without justification. .... Please offer a justification for your reasoning, and especially, why it should be addressed now

@workingjubilee I'm sorry for some impoliteness with "must". Not-patterns wasn't added also because in today rust syntax it is ugly to write them: NOT(Some(x)) = expr and it becomes almost pretty with isnot keyword. Now it should be reserved a a keyword, because it is dual to is , just like >=/<=; ==/!= and it is strange to add just one from dual pair.

But unfortunately equal is also a common function name in Rust

Uups

e2-71828 commented 4 months ago

People say that it's a common variable name, but I think that it's only common as a pluralisation of i, where i_s could easily serve that purpose.

I won’t claim it’s common, but it’s probably worth noting that is is the country code for Iceland, and so is a natural variable name for strings containing Icelandic-language text.

petrochenkov commented 4 months ago

I prototyped this feature back in 2018 and converted rustfmt to this style, but later dropped the corresponding rustfmt branch, accidentally and unfortunately. But the experience report is preserved at least - https://github.com/rust-lang/rfcs/pull/2260#issuecomment-367158854.

I still think this is the right thing to do, and something that should have been added instead of if-let chains from the start. It would be unfortunate if the scenario I predicted in https://github.com/rust-lang/rfcs/pull/2497#issuecomment-404860099 plays out and EXPR is PAT is not added for social reasons because if-let chains already exist.

joshtriplett commented 4 months ago

@petrochenkov Agreed. I think let chains have value because if-let already exists and people expect let chains to work, but I don't think that should prevent us from adding is. That would feel like a suboptimal path caused by path dependence.

ChayimFriedman2 commented 4 months ago

Considering the multiple bugs around temporaries that was found with let chains, perhaps we should just reserve the is keyword in edition 2024 and give the implementation more time to mature?

clarfonthey commented 4 months ago

Just because I haven't seen anyone comment on it yet, I would like to know if my intuition that is should have higher precedence than == (but still recommend parentheses, similar to mixing && and ||) matches others' intuition as well. I could just be an outlier here and would love if others pitched in how they feel as well.

Particularly this thread: https://github.com/rust-lang/rfcs/pull/3573#discussion_r1492740859

Feel free to just thumbs up/thumbs down to express support if you don't have much else to add.

joshtriplett commented 4 months ago

@clarfonthey wrote:

Just because I haven't seen anyone comment on it yet, I would like to know if my intuition that is should have higher precedence than == (but still recommend parentheses, similar to mixing && and ||) matches others' intuition as well. I could just be an outlier here and would love if others pitched in how they feel as well.

My intuition tells me "there is no possible circumstance in which I would ever want to see these combined without parentheses", which makes me feel that it's irrelevant what their relative precedence is.

(I think that's true for a few other cases in the existing precedence table as well.)

workingjubilee commented 4 months ago

That is disappointing to hear. People tend to eschew parentheses where they are unnecessary because the language already has many cases where some kind of parenthetical or brace or bracket is already either mandated by the syntactic form or is mandated by expressing the desired result, and it does not actually make the code significantly less clear to imitate Lisp slightly less.

kennytm commented 4 months ago

why would any need either boolean == (x is Some(z)) or (value == y) is true so frequently that one or two pairs of parenthesis are going to bother them :confused:

joshtriplett commented 4 months ago

People tend to eschew parentheses where they are unnecessary

That's my preference as well, for cases that are widely parsed correctly by people who don't have the precedence table memorized. But for instance, the lint against using && and || together without parentheses is a good example where we suggest that they are more necessary than the precedence table would otherwise indicate. I think there are some cases that are intuitively obvious to people, and others where if you don't have the precedence table memorized you're likely to find them confusing. And I've regularly seen confusion about (for instance) the parsing of as.

I do personally think mixing == and is without parentheses seems more likely to lead to confusion than clarity. If many people feel strongly in the other direction, I could imagine changing that from "parentheses are always required" to "warning lint for not using parentheses", like && and ||. In any case, I will include it in the alternatives section.

scottmcm commented 4 months ago

On parens:

The safe thing to do is start out always requiring them, since then we could look at how the code comes out with them, and remove the requirement as a non-breaking change later once we have evidence.

withoutboats commented 4 months ago

(NOT A CONTRIBUTION)

I feel confused by this proposal in general, I just don't ever experience the problem it alleges to be solving (and I never want let chaining either, for that matter). I don't like the idea of introducing more syntax and reserving more keywords just for the purpose of reducing nested blocks or making code read from left to right. This seems to me like it increases cognitive load on all users, especially new users, to advantage a certain coding style.

I also feel surprised by this proposal appearing now for 2024. It feels like very short notice for adding a new operator in the 2024 edition, but maybe I just haven't been following the relevant conversations closely enough and to many other people this idea is common knowledge.

The safe thing to do is start out always requiring them, since then we could look at how the code comes out with them, and remove the requirement as a non-breaking change later once we have evidence.

Please especially don't do this. You already decided to do something like this once with trait bounds and Fn traits and the weird arbitrary paren errors lingered for years as a result of how that was implemented. Please figure out the correct precedence before you stabilize the feature.

flip1995 commented 4 months ago

I just don't ever experience the problem it alleges

@withoutboats Take a look at the Clippy code base. Writing a lint is in large parts writing let-chains. You really don't want to nest all the if-expressions or early return for all of them. We've been using the if_chain crate for years, until let-chains were stable-enough (they are still not stabilized).

I still think this is the right thing to do, and something that should have been added instead of if-let chains from the start.

As let-chains are not stabilized yet, and iff there is consensus that the is approach is better, I think we should go with the is approach and remove let-chains again. I just think having both can cause problems and confusion, as I argued above.

dev-ardi commented 4 months ago

As let-chains are not stabilized yet, and iff there is consensus that the is approach is better, I think we should go with the is approach and remove let-chains again. I just think having both can cause problems and confusion, as I argued above.

I agree that having syntax for the two is confusing and we should choose only one because both are semantically the same. We should choose the one that is more straightforward for the users.

Users are already used to the "backwardness" of if let so while I personally like is a bit more I prefer let chains because they are easier to learn^{[citation needed]} and focus the effort on stabilizing those.

VitWW commented 4 months ago

Right now, "Reference-level explanation" does not tell us how to use "IsExpression", except "Detect is appearing as a top-level statement and produce an error, with a rustfix suggestion to use let instead" (but in "Future possibilities" and comments we see much more limitations). So I write rules more explicitly:

IsExpression :
    Expression is PatternNoTopAlt

BoolExpression :
    LazyAndExpr | IsExpression | Expression

LazyAndExpr :
    BoolExpression && BoolExpression

PredicateLoopExpression :
   while BoolExpression /*except struct expression*/ BlockExpression

IfExpression :
   if BoolExpression /*except struct expression*/ BlockExpression
   (else ( BlockExpression | IfExpression  ) )?

These rules could also be extended to match

MatchArmGuard :
   if BoolExpression

These rules could be unified together with let-chains if we just rewrite BoolExpression a bit:

LetExpr :
    let Pattern = Scrutinee

BoolExpression :
    LazyAndExpr | LetExpr | IsExpression | Expression

kennytm commented 4 months ago

For the binding question I think we only need answer if the following are going to be well-defined or not. I think it covered all kinds of expressions including the unstable ones.

Block expression

(preferred: binding won't escape the block, all of below are ill-formed because all w > 0 expression will cause E0425 "cannot find value `w`" error.)

// 01. Block expression
{ x is Some(w) } && w > 0;

// 02. Break
('a: {
    if cond { 
        break 'a val1 is Some(w); 
    } 
    val2 is Some(w) 
}) && w > 0;

// 03. Unsafe block expression
unsafe { x is Some(w) } && w > 0;

// 04. Const block expression
const { X is Some(w) } && w > 0;

// 05. If expression
(if cond { 
    val1 is Some(w) 
} else { 
    val2 is Some(w)
}) && w > 0;

// 06. Match expression
(match val {
    Ok(a) => a is Some(w),
    Err(b) => b is Some(w),
}) && w > 0;

// 07. Inline const pattern
#![feature(inline_const_pat)]
match true {
    const { Some(1) is Some(w) } if w > 0 => { w },
    _ => unreachable!(),
}

Unifying `||` expressions

(11 must be well-formed, while 12 and 13 are preferred to be well-formed too)

// 11. Distinct variables but unused
if val1 is Some(x) || val2 is Some(y) {
    // not using x and y here.
    println!("good");
}

// 12. Same variables and used
if val3 is Some(w) || val4 is Some(w) {
    println!("w = {w}");
}

// 13. Overlapping variable set
if val5 is Some((a, b)) || val6 is Some((b, c)) {
    println!("b = {b}");
}

Type casting

(preferred: well-formed, seems harmless)

// 21. Cast
(x is Some(w)) as bool && w > 0;

I don't know type theories so sorry for non-standard notations. Define $\braket{B|e|X}$, where *e* is a stream of token-trees (expression / pattern / type / etc), and *B*, *X* are sets of variables, to mean "the (local) variable names in *B* are bound during *e*'s execution, and *e* defined (probably refutable) new variables *X* after its successful execution". Here, *B* is a property of the environment and *X* is an intrinsic property of *e*. If *e* uses any variables name not bound by *B*, the compiler should raise the [E0425](https://doc.rust-lang.org/error_codes/E0425.html) "cannot find value in this scope" error. Note that *e* being able to see *B* does not mean *e* can actually "use" it as additional constraints may apply e.g. ```rust // ill formed because `y` is not const. // however, this shall result in E0435 "non-constant value in a constant" error // rather than E0425. x is Some(y) && const { y > 0 }; // ill formed because `b` is unbound in this scope. // this shall result in E0425. a is Some(b) || const { b > 0 }; ``` We will introduce rules that 1. moving top-down, given the bound variables $\bra{B}$ usable by a node, determine those $\bra{\color{green}B_i}$ of all child nodes. 2. moving bottom-up, given the newly defined variables $\ket{X_i}$ of each child node, determine overall set of variables $\ket{\color{red}X}$ of the parent node. For instance, when we say the rule for `x && y` is $\Braket{B| \braket{{\color{green}B}|x|X} \mathtt{\\&\\&} \braket{{\color{green}B\cup X}|y|Y} | {\color{red}X \cup Y} }$, it means (reading from left to right) 1. suppose the set of variables bound before `x && y` is $\bra{B}$ 2. the rule tells us that for `x` alone, the set of bound variables is also $\bra{\color{green}B}$ (colored green for inferred input) 3. `x` is going to define some new variables $\ket{X}$ 4. the operator is `&&` 5. the rule also tells us that for `y` alone, the set of bound variables is now $\bra{\color{green}B\cup X}$ 6. `y` is going to define another set of new variables $\ket{Y}$ 7. the overall expression `x && y` together is going to define the new variables set $\ket{\color{red}X \cup Y}$ (colored red for inferred output) Considering all [statements and expressions](https://doc.rust-lang.org/reference/statements-and-expressions.html), the set of rules are: 1. **Default case**, applicable to all token-tree streams not explicitly mentioned. They just inherit the surrounding environment and do not lift any new variables to its siblings. $$\Braket{B|f\left(\dotsc,\braket{{\color{green}B}|e_i|X_i},\dotsc\right)|{\color{red}\varnothing}}$$ * The default case is applicable to the following. * Literal expressions. * Path expressions. * Async block. * Composite expressions `&x`, `*x`, `x?`, `!x`, `x+y`, `x&y`, `x=y`, `x.n`, `f(x,y,z)`, `x[y]`, `S{x,y,z}`, `[x,y,z]`, `[x;n]`, `(x,y,z)`, `return x`, `x.await` etc. 2. **Grouped expression** propagates any new variables. $$\Braket{B| ( \braket{{\color{green}B}|e|X} ) |{\color{red}X}}$$ 3. **Block expression** traps new variables. Applicable to (labeled) `unsafe`, `const` and `loop` block as well. $$\Braket{B| \\{ … ; \braket{{\color{green}B \cup \text{preceding \texttt{let}s}}|\mathit{expr}|X} \\} |{\color{red}\varnothing}}$$ * The output being $\ket{\color{red}\varnothing}$ means that `{x is Some(w)} && w > 0` is ill-formed. The rationale is that all variables do not escape the scope defined by the block. * An alternative is allow the final expression to propagate the variable definition out i.e. $\ket{\color{red}X}$. But a labeled block expression can be `break`ed and therefore the output should intersect with them as well, and that IMO is going to be a huge mess. 4. **Closure expression** (nothing special, just to clarify the closure body can see the parameters). $$\Braket{B| \mathtt{move} \left| \braket{{\color{green}B}|\mathit{params}|P} \right| \braket{{\color{green}B \cup P}|\mathit{closure}|X} |{\color{red}\varnothing}}$$ 5. **`is` expression**, inheriting all valid bindings created by the pattern. $$\Braket{B| \braket{{\color{green}B}|e|X} \mathtt{is} \braket{{\color{green}B}|p|P} |{\color{red}P}}$$ 6. **`&&` expression**, explained previously. $$\Braket{B| \braket{{\color{green}B}|x|X} \mathtt{\\&\\&} \braket{{\color{green}B\cup X}|y|Y} | {\color{red}X \cup Y} }$$ 7. **`if` expression**, variables defined in the condition is bound in the then branch and ignored in the else branch. Also applicable to `while` expression. For simplicity we consider `let pat = expr` being equivalent to `expr is pat`. $$\Braket{B|\mathtt{if} \braket{{\color{green}B}|cond|X} \braket{{\color{green}B\cup X}|then|Y} \mathtt{else} \braket{{\color{green}B}|else|Z} |{\color{red}\varnothing}}$$ * Again the output is set to $\ket{\color{red}\varnothing}$ rather than $\ket{\color{red}X\cap Y}$ with a similar rationale of the block expression. 8. **`match` expression** $$\Braket{B|\mathtt{match} \braket{{\color{green}B}|q|Q} \\{ \braket{{\color{green}B}|p_i|P_i} \mathtt{if} \braket{{\color{green}B \cup P_i}|c_i|C_i} \Rightarrow \braket{{\color{green}B \cup P_i \cup C_i}|e_i|X_i} \\}|{\color{red}\varnothing}}$$ * Again the output is set to $\ket{\color{red}\varnothing}$ rather than $\ket{\color{red}\bigcap_i X_i}$ with a similar rationale of the block expression. 9. **`||` expression**. $$\Braket{B| \braket{{\color{green}B}|x|X} \mid\mid \braket{{\color{green}B}|y|Y} | {\color{red}X \cap Y} }$$ * This rule is debatable. There are at least 2 alternatives on the output side. * (A) We can follow the [static semantics of Or-patterns](https://doc.rust-lang.org/reference/patterns.html#static-semantics) and raise [E0408](https://doc.rust-lang.org/stable/error_codes/E0408.html) if *X* ≠ *Y*. However this will make `x is Some(a) || y is Some(b)` fail to compile and I think this is unacceptable. * (B) We can change the output to $\ket{\color{red}\varnothing}$, i.e. not giving `||` any special treatment. But that means in the following example the variable `client` can't be bound ```rust if target_1 is Some(client) || target_2 is Some(client) { client.send(1); } ``` Nevertheless this can be easily worked-around by rewriting as an Or-pattern over a tuple (and at this point it's no more readable than the `if let` equivalent) ```rust if (target_1, target_2) is ((Some(client), _) | (_, Some(client))) { client.send(1); } ``` This option is forward-compatible with the $\ket{\color{red}X \cap Y}$ behavior so maybe we should use this as the conservative starting point similar to RFC 2497. 10. **Cast expression**, applicable to type ascription expression (`x.`) as well, if re-implemented. Treated to be the same as grouped expression, because the only useful allowed conversion is `bool as bool` which is identity. $$\Braket{B| \braket{{\color{green}B}|\mathit{expr}|X} \mathtt{as} \braket{{\color{green}B}|\mathit{type}|T} |{\color{red}X}}$$

TimNN commented 4 months ago

After thinking about this for some time, my conclusions are:

The is syntax is superior to let-chains, because it works in more context.
We should avoid having multiple ways to do one thing. So because is can be used in more situations than let-chains, and can do everything that let-chains can, we should not have let-chains.
- Even if we had both, one could argue that is should always be preferred over let-chains, see the next item.
With the correct framing, is can interact reasonably well with the existing if let and while let:
- I think of if let as a "conditional let", i.e. the focus is on introducing a new binding.
- Whereas for is, the focus is on matching a pattern.
- The distinction isn't quite as clear for the "chained" case, though I think one can argue that as soon as you need an &&, you're doing more pattern matching than binding creation, so is would be more appropriate.
- This could be enforced by a lint. (if let ... = ... {} with no binding on the LHS should be is, and if ... is ... {} with a binding on the RHS should be if let).
The fact that is introduces bindings into the body of a conditional should be considered a nice side-effect, not its primary purpose.

Me trying to reason this all out:

_Note: May not be entirely coherent, I haven't fully revised / edited this section._ > I've been thinking about when I would prefer which of the different syntaxes: > > `let ... = ... else { ... }`: This is the only construct that allows introducing a binding into the current scope (if the `else` block diverges), so that's definitely useful. > > `if let ... = ... {}`: It's easy to think of this as a "conditional `let`" or the opposite of a `let ... else`. Personally, I also sometimes prefer this form if the LHS is simple, and the RHS is complex, because it tells me up front what I can expect. > > `if let ... = ... && ... {}`: This seems like a reasonable extension of the previous item, if you think of it as a "special form of `if`". If you think of the previous item as a "conditional `let`", then I think this syntax makes less sense. > > `if let ... = ... && let ... = ...`: Having a construct to express this kind of chaining seems useful, but this particular syntax doesn't feel ideal. I think what bothers me most about this is that it only works in the context of an `if` (i.e., to mentally parse the second `let` I have to refer back to the initial `if`). > > The great thing about `is` as proposed by this RFC is that it works everywhere. Just by looking at the keyword, I know that we have a conditional binding, the context I'm in doesn't matter. I think that's great for consistency. > > --- > > My intuition about the distinction between the two syntaxes is this: > > * Use `if let` if the focus should be on introducing a new binding. > * Use `is` if the focus should be on matching a pattern. > > This distinction obviously only works for the simples form of the syntax (`if let ... = ... { ... }` and `if ... is ... { ... }`). So let's consider other cases. > > In non-control-flow expression, I think that `is` (i.e. `some_fn(.. is ...)`) is clearly the better syntax (because this will never introduce a new binding into the current scope, so `let` doesn't make sense). > > Because it can be used in non-control-flow expressions, `... is ...` obviously evaluates to a boolean, which means that `... is ... && ...` must be valid. And if that form is valid, then it's only reasonable to expect hat any bindings introduced by the `is` are usable on the RHS of the `&&`, I believe. > > And if `is` can introduce bindings into the expression of a conditional, it's only natural to expect that those bindings are also usable in the conditional's body. > > Thus, `is` must be at least as powerful as `let`-chains.

Btw, I think it would be good for the RFC to be explicit about the following (i.e., include a snippet show it working):

Using is in the condition of a while loop (and the fact that any bindings can be used inside the loop).
Using is inside a match guard (and whether or not any bindings from the guard can be used inside the match arm body).

VitWW commented 4 months ago

One difference exists between let-chains and is-expressions: operator precedence

kennytm commented 4 months ago

lack of = (or/and let) in the example confuses them

not to say their experience is invalid, but a match arm and a for loop also created bindings without = or let.

ssokolow commented 4 months ago

Like, I've been writing a lot of Lua lately and ~= is just straight-up != in Lua.

Likely because, in C, ~ means bitwise complement and various languages (Rust included) have merged logical and bitwise NOT into a single type-switched operator.

workingjubilee commented 4 months ago

why would any need either boolean == (x is Some(z)) or (value == y) is true so frequently that one or two pairs of parenthesis are going to bother them 😕

I don't really see the use in requiring people to write stuff like

iter.filter(|(i, item)| (i % 2 == 0) == (item is Some((a, b))))

because we're afraid of defining a precedence.

kennytm commented 4 months ago

@workingjubilee

You realize you already can't write x == y == z without parenthesis? (same for other comparison operators, <, <=, >, >=, !=)

error: comparison operators cannot be chained
 --> src/main.rs:2:18
  |
2 |     let a = true == true == true;
  |                  ^^      ^^
  |

withoutboats commented 4 months ago

(NOT A CONTRIBUTION)

not to say their experience is invalid, but a match arm and a for loop also created bindings without = or let.

There's a connection between this user feedback and the binding question.

let is the only construct which creates bindings in subsequent sibling nodes of the AST, these other bindings are only in scope in child nodes of the AST. (This is also a difference of Rust's let from the let in ML, a difference that has precedent in a lot of imperative languages and ultimately derives from the original 50s-era imperative languages without block structure at all).

I think this user is expressing an intuition that this distinguishes let bindings from other bindings; recognizing this as a property of let would push toward the let chaining rather than new syntax that doesn't involve let.

This also can give guidance about the binding question: surely the binding rules for this or let chaining should never allow you to bind something in a parent node, which for example the block structures and type casts would all do.

PatchMixolydic commented 4 months ago

As a passerby, if I could recreate Rust from scratch, I'd definitely go with $expr is $pat over if let/while let. As a matter of fact, I've already implemented a similar syntax in a hobby language. However, as it stands, it seems like it's much too late to make this change. If the introduction of is expressions leads to the deprecation of {if,while} let expressions in Rust 2024+ (and I don't see why it wouldn't[^1]), it would cause a disproportionate amount of churn for a tiny increase in flexibility and learnability (and, if memory serves, my struggle in learning if let was less "I don't understand this" and more just "why is it backwards?").

I believe the Language Design Team's put it better than I ever could [^2]:

The established Rust community with knowledge of existing Rust syntax has a great deal of value, and to be considered, a syntax change proposal would have to be not just better, but so wildly better as to overcome the massive downside of switching.

It isn't clear to me that the ability to use binding patterns in more places is wildly better than relying on/extending Rust's existing pattern matching constructs. As far as I can tell, let chains would give 80% of the power of is expressions while causing significantly less churn (largely from collapsing if let _ = _ { if _ {} } and migrating if_chain! invocations).

[^1]: The RFC motions against this, but something about having both {if,while} let expressions and is expressions feels out of place with respect to Rust's design principles. Even if this RFC is merged as-is, I could easily see people pushing for the deprecation of {if,while} let in Rust 2027.

[^2]: I know the author of this RFC is part of the lang team and likely kept this principle in mind as he wrote it; I'm just using this to explain my point.

kennytm commented 4 months ago

I think this user is expressing an intuition that this distinguishes let bindings from other bindings; recognizing this as a property of let would push toward the let chaining rather than new syntax that doesn't involve let.

So I suppose if the construct is spelled $expr is let $pat (or $expr = let $pat? :upside_down_face:) it would be easier to teach that a binding is introduced.

if expr_producing_option() is let Some(v) && condition(v) { use(v); }

if color is let (RGB(r, g, b) | RGBA(r, g, b, _)) && r == b && g < 10 {
    println!("condition met")
}

func(x is let Some(y) && y > 3);

OTOH, the is let here is only constrained within the current expression statement and not its siblings, which might be confusing from the other direction (though if let / while let's binding also terminate within its own block).

jdahlstrom commented 4 months ago

@joshtriplett

This is not related to pattern types. I believe we can do both without conflict. I added some text to the "unresolved questions" section to confirm that we can do both without conflicts.

Could be worth mentioning that there would be natural interaction between x is PAT and pattern types, if x would/could be flow typed to T is PAT in the scope where the test evaluates to true. So there would be a pleasant syntactic parallel if is can be used for both.

withoutboats commented 4 months ago

(NOT A CONTRIBUTION)

So I suppose if the construct is spelled $expr is let $pat (or $expr = let $pat? 🙃) it would be easier to teach that a binding is introduced.

I think if the community really decides that the pat/expr ordering of let bindings is a "problem" worth solving with more syntax, the solution that comes to mind is to allow all let bindings to be written something like let $expr is $pat. I think this is a bad idea (for the same reason I think #3295 is a bad idea) and you should just go with let chaining without changing the order on the basis of this motivation, but that would at least be consistent with Rust's binding rules and not introduce questions about binding in || patterns or into the parent scope like you previously identified.

OTOH, the is let here is only constrained within the current expression statement and not its siblings, which might be confusing from the other direction (though if let / while let's binding also terminate within its own block).

The way to parse this is that the siblings are the other conditionals and the parent is the if, not the block the if is in. People who expect let chaining to work seem to be operating off the same intuition. Of course this runs into weirdness with the || joining, so you can only && them, revealing the way in which this thought process is sort of fuzzy (but if people have enough problem with deeply nesting their if lets then I guess let chaining is the answer to that problem.)

Yokinman commented 4 months ago

I feel like the is operator could have its own purpose unique from if-let and while-let expressions, just to do with the scope of the binding. For a start, the variable binding defined by an is expression isn't super explicit since it appears deeper into the expression, especially if it's deeply nested in parentheses; whereas if let makes it super obvious that a variable is being defined.

Essentially, this example wouldn't compile and might instead tell you to use an if-let binding if you want to use the variable inside the block:

if an_option is Some(x) && x > 3 {
    println!("{x}");
}

Maybe parentheses (and certain other operators like ||?) would define the scope of the binding within the condition, so you could reuse the same variable multiple times without shadowing.

if (an_option is Some(x) && x > 3) || (another_option is Some(x) && x < 3) {
    println!("awesome, I just can't use x in here!");
}

If the point of this is to improve readability, I think this would serve the job better.

Although, one thing that could be confusing is shadowing an existing binding with the is expression. I guess the block would be able to refer to the original binding while ignoring the one in the condition, which might seem ambiguous/confusing?

Also, at this point the is expression would basically just be syntax sugar for the matches! macro or Option::is_some_and style methods, which I don't think is necessarily a bad thing (besides the possibly confusing shadowing, which you don't really get with the macro or method).

workingjubilee commented 4 months ago

@workingjubilee

You realize you already can't write x == y == z without parenthesis? (same for other comparison operators, <, <=, >, >=, !=)
error: comparison operators cannot be chained
 --> src/main.rs:2:18
  |
2 |     let a = true == true == true;
  |                  ^^      ^^
  |

When I chose my example, I spent a while carefully choosing an expression someone might actually write, and that closely resembles code I have written before, that I also feel illustrates the problem reasonably well. I have a problem tracking all the parentheses, you see, and I often carve apart expressions when moving around code using the wrong pair of braces or parens. It leads to me often rewriting code from scratch instead of the simpler operation of cut and paste. It's a problem with how I read the code, and it leads to bigger, more "explicit" expressions being even worse, because it gets harder for me to find the splitting points.

So who, exactly, cares about let a = true == true == true;? I would like to meet them.

kennytm commented 4 months ago

@workingjubilee

So who, exactly, cares about let a = true == true == true;? I would like to meet them.

You wrote iter.filter(|(i, item)| (i % 2 == 0) == (item is Some((a, b)))) which is is already impossible today without parenthesis even without considering the is part

error: comparison operators cannot be chained
 --> src/main.rs:2:35
  |
2 |     iter.filter(|(i, item)| i % 2 == 0 == item.is_some())
  |                                   ^^   ^^
  |

matthieu-m commented 4 months ago

Maybe parentheses (and certain other operators like ||?) would define the scope of the binding within the condition, so you could reuse the same variable multiple times without shadowing.

I would argue against the idea of "artificial" limitations, and instead argue towards greater scope, if possible.

In your very example, I would argue that x should be usable:

if (an_option is Some(x) && x > 3) || (another_option is Some(x) && x < 3) {
    println!("Here's the {x} I got");
}

In a pattern it's possible to bind x in multiple alternative patterns (Either(x) | Or(x)) and this regularly allows to simplify code: the part using the x downstream need not be repeated!

In a sense, is offers this on steroids: you can know unify patterns & additional arbitrary conditions.

Although, one thing that could be confusing is shadowing an existing binding with the is expression. I guess the block would be able to refer to the original binding while ignoring the one in the condition, which might seem ambiguous/confusing?

Shadowing wouldn't be a problem, as long as it's not possible to refer to the outer x once an x has been defined in a "header" scope: condition of if or while, maybe a few other situations?

I would expect a deny-by-default lint forbidding the use of the outer x in such a situation would prevent any ambiguity.

rust-lang / rfcs