Reduce the issue to a minimal, self-contained, reproducible test case. Avoid dependencies to mathlib4 or std4.
Description
For a syntax category with behavior := symbol, an identifier using the name of token parser parses as that token rather than as an identifier.
Context
I discovered that str identifiers weirdly broke in Alloy a long while back, but could not figure out why back then with the time I dedicated to the issue. This came up again and now I figured out the cause.
Steps to Reproduce
import Lean.Parser.Extra
declare_syntax_cat bug (behavior := symbol)
syntax ident : bug
#check `(bug|ident) -- works
section
#check `(bug|num) -- works
local syntax num : bug
#check `(bug|num) -- expected numeral
end
section
#check `(bug|scientific) -- works
local syntax scientific : bug
#check `(bug|scientific) -- expected scientific number
end
section
#check `(bug|str) -- works
local syntax str : bug
#check `(bug|str) -- expected string literal
end
section
#check `(bug|char) -- works
local syntax char : bug
#check `(bug|char) -- expected character literal
end
section
#check `(bug|name) -- works
local syntax name : bug
#check `(bug|name) -- expected Name literal
end
section
#check `(bug|interpolatedStr) -- works
local syntax interpolatedStr(bug) : bug
#check `(bug|interpolatedStr) -- expected interpolated string
end
open Lean.Parser
attribute [run_parser_attribute_hooks] fieldIdx
section
#check `(bug|fieldIdx) -- works
local syntax fieldIdx : bug
#check `(bug|fieldIdx) -- expected field index
end
Expected behavior:
For these identifiers to parse as identifiers
Actual behavior:
They parse as the token they name.
Versions
Lean (version 4.1.0, commit a832f398b80a, Release)
Windows 10 22H2
Additional Information
This behavior is due to the following:
With behavior := symbol, the pratt parser use an identifier's val to index into the token map and narrow the parsers to test
Parsers are added to the token map based on their parser's firstTokens.
The token parsers are using their parser name as their first token.
The most straightforward solution is to make the parsers' first tokens not clash with valid identifiers. However, since users can specify almost any string as an identifier via guillemets, this is non-trivial. But, guillemets do not nest and do not carry over into the val, so I believe a solution would be to enclose the first token names themselves in guillemets (e.g. info := mkAtomicInfo "«str»") .
@Kha Oh, interesting I had not made the connection between these two issues. I guess a prefix for symbol token names would solve both this problem and that one.
Prerequisites
Description
For a syntax category with
behavior := symbol
, an identifier using the name of token parser parses as that token rather than as an identifier.Context
I discovered that
str
identifiers weirdly broke in Alloy a long while back, but could not figure out why back then with the time I dedicated to the issue. This came up again and now I figured out the cause.Steps to Reproduce
Expected behavior:
For these identifiers to parse as identifiers
Actual behavior:
They parse as the token they name.
Versions
Lean (version 4.1.0, commit a832f398b80a, Release)
Windows 10 22H2Additional Information
This behavior is due to the following:
behavior := symbol
, the pratt parser use an identifier'sval
to index into the token map and narrow the parsers to testfirstTokens
.The most straightforward solution is to make the parsers' first tokens not clash with valid identifiers. However, since users can specify almost any string as an identifier via guillemets, this is non-trivial. But, guillemets do not nest and do not carry over into the
val
, so I believe a solution would be to enclose the first token names themselves in guillemets (e.g.info := mkAtomicInfo "«str»"
) .Impact
Add :+1: to issues you consider important. If others are impacted by this issue, please ask them to add :+1: to it.