lezer-parser / lezer

Dev utils and issues for the Lezer core packages
33 stars 1 forks source link

Token parsing behavior change in generator between 1.2.0 and 1.2.1 #35

Closed r3c closed 1 year ago

r3c commented 1 year ago

Hello!

I'm trying to bump @lezer/generator to latest version 1.2.2 (from 1.2.0) currently and am facing a regression in our unit tests, which seems to be due to a behavior change introduced in 1.2.1.

Here is a reproduction grammar:

@top Root {
  ConflictingToken |
  SymbolToken
}

@tokens {
  @precedence {
    ConflictingToken,
    SymbolToken
  }

  ConflictingToken {
    'conflict'
  }

  SymbolToken {
    $[a-zA-Z_]+
  }
}

I'm not sure the later behavior is intentional since it interferes with parsing most language keywords. Inverting the precedence of the two rules won't work either, since all "conf", "conflict" and "conflicting" inputs would all be matched as "SymbolToken". I wonder if the change could have been introduced in https://github.com/lezer-parser/generator/commit/b38d018fcf01f6909b47c4dd0639639e704522b6 ; would you mind sharing your thoughts about this?

Regards, Rémi

marijnh commented 1 year ago

You appear to have been relying on a bug in the way precedences were applied. Since you explicitly say ConflictingToken has higher precedence than SymbolToken, the new behavior is what the system is supposed to do.

It is almost always preferable to use @specialize to recognize keywords, rather than including them as separate tokens.

r3c commented 1 year ago

Hey @marijnh, you were very right, our grammar file was doing a wrong usage of precedence to solve overlapping symbol issues. Thanks a lot for pointing that out!