Lower range patterns from AST to HIR

Veykril commented 2 years ago

Hm, to me it seems correct to use pat..pat in the syntax, but expr..expr in the hir.

Syntactically, they are patterns, as .. is an infix binary operator. We could add some LR-style re-structuring of the tree, where we don't create a pat until we haven't seen .., but that seems complicated. Feels more natural to parse a pat, than notice an .., and parse another pat.

But yeah, in semantics we want to keep those as expressions, so during lowering we need to figure out that, what was a pattern syntactically, actually is an expression semantically.

This feel dual to recently stabilized destructive assignment, where we parse stuff like (a, b) = (b, a) as expr = expr, but, during lowering, lower the LHS as a pattern.

Though, seeing the two example side-by-side, maybe we should bite the bullet and just don't distinguish patterns and expressions at the level of syntax? That is, we'd use ast::Expr for both patterns and expressions. We'd then have an API like

impl ast::Expr {
    fn classify(&self) -> IsPattern | IsExpression | Ambiguous
}

impl Semnatics {
    fn classify_expr(&self, expr: &ast::Expr) -> IsPattern | IsExpression
}

_Originally posted by @matklad in https://github.com/rust-lang/rust-analyzer/pull/12158#discussion_r867470092_

HKalbasi commented 1 year ago

How go to definition and semantic highlighting is working in those pattern currently? In this code:

pub const L: i32 = 6;
mod x {
    pub const R: i32 = 100;
}
const fn f(x: i32) -> i32 {
    match x {
        -1..=5 => x * 10,
        L..=x::R => x * 100,
        _ => x,
    }
}

Everything works for R, but L is detected as an ident pattern.

Veykril commented 1 year ago

https://github.com/rust-lang/rust-analyzer/blob/c06f08896835d0162a1b328726578dbf886d888f/crates/hir/src/semantics.rs#L450-L452 https://github.com/rust-lang/rust-analyzer/blob/c06f08896835d0162a1b328726578dbf886d888f/crates/hir/src/source_analyzer.rs#L493-L509

As that function fails there, R resolves because we fall back to heuristic path resolution due to it being part of a multi segment path I think, while L is single segment so we assume its an ident pattern.

HKalbasi commented 1 year ago

L is parsed as IdentPat

RANGE_PAT@617..625
  IDENT_PAT@617..618
    NAME@617..618
      IDENT@617..618 "L"
  DOT2EQ@618..621 "..="
  PATH_PAT@621..625
    PATH@621..625
      PATH@621..622
        PATH_SEGMENT@621..622
          NAME_REF@621..622
            IDENT@621..622 "x"
      COLON2@622..624 "::"
      PATH_SEGMENT@624..625
        NAME_REF@624..625
          IDENT@624..625 "R"

Isn't this wrong? The reference says that only path is allowed here. I think it would work correctly if it was parsed as PathPat, but it might become irrelevant once we lower them to hir.

Veykril commented 1 year ago

Our grammar differs from the reference in a few parts and that's fine (and in parts deliberate), so I think we want to keep the parse tree as is.

matklad commented 1 year ago

yup, on the level of concrete syntax, .. is a binary operator on patterns:

let S {} .. _ = ();

LET_STMT@3116..3135
  LET_KW@3116..3119 "let"
  WHITESPACE@3119..3120 " "
  RANGE_PAT@3120..3129
    RECORD_PAT@3120..3124
      PATH@3120..3121
        PATH_SEGMENT@3120..3121
          NAME_REF@3120..3121
            IDENT@3120..3121 "S"
      WHITESPACE@3121..3122 " "
      RECORD_PAT_FIELD_LIST@3122..3124
        L_CURLY@3122..3123 "{"
        R_CURLY@3123..3124 "}"
    WHITESPACE@3124..3125 " "
    DOT2@3125..3127 ".."
    WHITESPACE@3127..3128 " "
    WILDCARD_PAT@3128..3129
      UNDERSCORE@3128..3129 "_"
  WHITESPACE@3129..3130 " "
  EQ@3130..3131 "="
  WHITESPACE@3131..3132 " "
  TUPLE_EXPR@3132..3134
    L_PAREN@3132..3133 "("
    R_PAREN@3133..3134 ")"
  SEMICOLON@3134..3135 ";"

The general philosophy is to parse what's natural on purely syntactic basis, and then place additional restrictions when checking ast/lowering to hir.

rust-lang / rust-analyzer

Lower range patterns from AST to HIR #12210