Declaring literal syntax kind as keyword breaks precedence parsing - Githubissues

leanprover / lean4

Lean 4 programming language and theorem prover

https://lean-lang.org

Apache License 2.0

4.66k stars 418 forks source link

Declaring literal syntax kind as keyword breaks precedence parsing #2395

Open Kha opened 1 year ago

Kha commented 1 year ago

When indexing the parser tables with the next token (ignoring extended identifier behavior for this issue), we do one of two things:

If the next token is an naked atom s (i.e. a symbol or keyword), we use s as the index
If the next token is a node (i.e. a literal) of kind k, we use k as the index

These two categories are reasonably disjoint in Lean code, but that is unfortunately not true for embedded languages!

import Lean

declare_syntax_cat cat
syntax char : cat
syntax "char" : cat

elab "test" c:cat : command =>
  IO.println c

-- (choice (catChar "char") (cat_ "char"))
test char

Here the keyword char and the literal kind charKind = `char overlap

Kha commented 1 year ago

I would propose to consistently prefix atoms when interpreted as kinds with token. (or atom.?), which in fact we already do in TSyntax: https://github.com/leanprover/lean4/blob/8de1c0786c5b2f77f94b19224f4b08a054f935d4/tests/lean/1275.lean.expected.out#L9 There will be a small cost in indexing overhead which we should measure but I expect to be insignificant.

tydeu commented 1 year ago

@Kha For complete safety, we could instead make SyntaxNodeKind and inductive (which we use to index the tables and parameterize TSyntax). For example:

inductive SyntaxNodeKind
| missing
| ident
| atom (s : String)
| node (kind : Name)