Closed susliko closed 2 years ago
Interesting (but common!) case. Seems like a tricky lexical precedence issue. I'll have to think about how to solve it.
In the formal grammar this is solved by saying that identifiers cannot start with WF_
or SF_
but that is a bit more difficult to implement in tree-sitter grammars.
Took a first crack at this by changing the identifier
rule to implement this DFA that excludes starting with SF_
or WF_
:
with
identifier: $ => regexOr(
'[SW]',
'[SW][^F]\w*',
'[SW]F',
'[SW]F[^_]\w*',
'[A-RT-VX-Za-z]\w*',
'[^A-Za-z]+[A-Za-z]\w*'
),
Unfortunately this bloats the size of the generated parser.c
file to 54 MB.
For some reason I can't seem to solve this using lexical precedence inside the fairness
rule. I can just move the identifier
rule into the external scanner without any bloating of the grammar but that seems annoying. Here's the test that should pass:
=====================|||
Weak Fairness Ambiguity
=====================|||
---- MODULE Test ----
op == WF_vars(x)
====
---------------------|||
(source_file (module (header_line) (identifier) (header_line)
(operator_definition (identifier) (def_eq)
(fairness (identifier_ref) (identifier_ref))
)
(double_line)))
I can fix this issue using explicit lexical precedence as long as I get rid of keyword extraction; however, this bloats the parser size from 33 MB to 38 MB. Will see whether linked bug is actually a bug.
Consider an example:
WF_vars
is parsed as$.bound_op
instead of$.fairness
: