leanprover / lean4

Lean 4 programming language and theorem prover
https://lean-lang.org
Apache License 2.0
4.74k stars 427 forks source link

ident treated as token (e.g. `str` as a string) with `behavior := symbol` #2608

Open tydeu opened 1 year ago

tydeu commented 1 year ago

Prerequisites

Description

For a syntax category with behavior := symbol, an identifier using the name of token parser parses as that token rather than as an identifier.

Context

I discovered that str identifiers weirdly broke in Alloy a long while back, but could not figure out why back then with the time I dedicated to the issue. This came up again and now I figured out the cause.

Steps to Reproduce

import Lean.Parser.Extra

declare_syntax_cat bug (behavior := symbol)
syntax ident : bug

#check `(bug|ident) -- works

section
#check `(bug|num) -- works
local syntax num : bug
#check `(bug|num) -- expected numeral
end

section
#check `(bug|scientific) -- works
local syntax scientific : bug
#check `(bug|scientific) -- expected scientific number
end

section
#check `(bug|str) -- works
local syntax str : bug
#check `(bug|str) -- expected string literal
end

section
#check `(bug|char) -- works
local syntax char : bug
#check `(bug|char) -- expected character literal
end

section
#check `(bug|name) -- works
local syntax name : bug
#check `(bug|name) -- expected Name literal
end

section
#check `(bug|interpolatedStr) -- works
local syntax interpolatedStr(bug) : bug
#check `(bug|interpolatedStr) -- expected interpolated string
end

open Lean.Parser
attribute [run_parser_attribute_hooks] fieldIdx

section
#check `(bug|fieldIdx) -- works
local syntax fieldIdx : bug
#check `(bug|fieldIdx) -- expected field index
end

Expected behavior:

For these identifiers to parse as identifiers

Actual behavior:

They parse as the token they name.

Versions

Lean (version 4.1.0, commit a832f398b80a, Release) Windows 10 22H2

Additional Information

This behavior is due to the following:

The most straightforward solution is to make the parsers' first tokens not clash with valid identifiers. However, since users can specify almost any string as an identifier via guillemets, this is non-trivial. But, guillemets do not nest and do not carry over into the val, so I believe a solution would be to enclose the first token names themselves in guillemets (e.g. info := mkAtomicInfo "«str»") .

Impact

Add :+1: to issues you consider important. If others are impacted by this issue, please ask them to add :+1: to it.

Kha commented 1 year ago

A variant of #2395, in some sense

tydeu commented 1 year ago

@Kha Oh, interesting I had not made the connection between these two issues. I guess a prefix for symbol token names would solve both this problem and that one.