tlaplus-community / tree-sitter-tlaplus

A tree-sitter grammar for TLA⁺ and PlusCal
MIT License
57 stars 10 forks source link

Fairness token is not parsed #48

Closed susliko closed 2 years ago

susliko commented 2 years ago

Consider an example:

------ MODULE Test ------

VARIABLES num
vars == << num >>
Spec == \A self \in {}: WF_vars(TRUE)

==========================

WF_vars is parsed as $.bound_op instead of $.fairness:

module [0, 0] - [6, 26]
  header_line [0, 0] - [0, 6]
  name: identifier [0, 14] - [0, 18]
  header_line [0, 19] - [0, 25]
  variable_declaration [2, 0] - [2, 13]
    identifier [2, 10] - [2, 13]
  operator_definition [3, 0] - [3, 17]
    name: identifier [3, 0] - [3, 4]
    def_eq [3, 5] - [3, 7]
    definition: tuple_literal [3, 8] - [3, 17]
      langle_bracket [3, 8] - [3, 10]
      identifier_ref [3, 11] - [3, 14]
      rangle_bracket [3, 15] - [3, 17]
  operator_definition [4, 0] - [4, 37]
    name: identifier [4, 0] - [4, 4]
    def_eq [4, 5] - [4, 7]
    definition: bounded_quantification [4, 8] - [4, 37]
      quantifier: forall [4, 8] - [4, 10]
      bound: quantifier_bound [4, 11] - [4, 22]
        identifier [4, 11] - [4, 15]
        set_in [4, 16] - [4, 19]
        set: finite_set_literal [4, 20] - [4, 22]
      expression: bound_op [4, 24] - [4, 37]
        name: identifier_ref [4, 24] - [4, 31]
        boolean [4, 32] - [4, 36]
  double_line [6, 0] - [6, 26]
ahelwer commented 2 years ago

Interesting (but common!) case. Seems like a tricky lexical precedence issue. I'll have to think about how to solve it.

In the formal grammar this is solved by saying that identifiers cannot start with WF_ or SF_ but that is a bit more difficult to implement in tree-sitter grammars.

ahelwer commented 2 years ago

Took a first crack at this by changing the identifier rule to implement this DFA that excludes starting with SF_ or WF_:

IMG_0080

with

identifier: $ => regexOr(
  '[SW]',
  '[SW][^F]\w*',
  '[SW]F',
  '[SW]F[^_]\w*',
  '[A-RT-VX-Za-z]\w*',
  '[^A-Za-z]+[A-Za-z]\w*'
),

Unfortunately this bloats the size of the generated parser.c file to 54 MB.

ahelwer commented 2 years ago

For some reason I can't seem to solve this using lexical precedence inside the fairness rule. I can just move the identifier rule into the external scanner without any bloating of the grammar but that seems annoying. Here's the test that should pass:

=====================|||
Weak Fairness Ambiguity
=====================|||

---- MODULE Test ----
op == WF_vars(x)
====

---------------------|||

(source_file (module (header_line) (identifier) (header_line)
  (operator_definition (identifier) (def_eq)
    (fairness (identifier_ref) (identifier_ref))
  )
(double_line)))
ahelwer commented 2 years ago

I can fix this issue using explicit lexical precedence as long as I get rid of keyword extraction; however, this bloats the parser size from 33 MB to 38 MB. Will see whether linked bug is actually a bug.