leanprover / lean4

Lean 4 programming language and theorem prover
https://lean-lang.org
Apache License 2.0
4.66k stars 418 forks source link

Declaring literal syntax kind as keyword breaks precedence parsing #2395

Open Kha opened 1 year ago

Kha commented 1 year ago

When indexing the parser tables with the next token (ignoring extended identifier behavior for this issue), we do one of two things:

These two categories are reasonably disjoint in Lean code, but that is unfortunately not true for embedded languages!

import Lean

declare_syntax_cat cat
syntax char : cat
syntax "char" : cat

elab "test" c:cat : command =>
  IO.println c

-- (choice (catChar "char") (cat_ "char"))
test char

Here the keyword char and the literal kind charKind = `char overlap

Kha commented 1 year ago

I would propose to consistently prefix atoms when interpreted as kinds with token. (or atom.?), which in fact we already do in TSyntax: https://github.com/leanprover/lean4/blob/8de1c0786c5b2f77f94b19224f4b08a054f935d4/tests/lean/1275.lean.expected.out#L9 There will be a small cost in indexing overhead which we should measure but I expect to be insignificant.

tydeu commented 1 year ago

@Kha For complete safety, we could instead make SyntaxNodeKind and inductive (which we use to index the tables and parameterize TSyntax). For example:

inductive SyntaxNodeKind
| missing
| ident
| atom (s : String)
| node (kind : Name)