tlaplus-community / tree-sitter-tlaplus

A tree-sitter grammar for TLA⁺ and PlusCal
MIT License
57 stars 10 forks source link

Junctlists should terminate on definition of new module-level unit #3

Closed ahelwer closed 9 months ago

ahelwer commented 3 years ago

Implement logic in function https://github.com/tlaplus-community/tree-sitter-tlaplus/blob/a74a18b4e74ff5144ec288dbb0aab3e232df633d/src/scanner.cc#L111

This will handle the strange case where someone begins defining a new module-level unit definition as follows, instead of at the start of the next line:

op ==
    /\ 1
    /\ 2
         op2 == 3

A half-assed first attempt at this lives in the jlists-unit-termination branch. Shelved until I learn more about how to write a parser.

This would probably be easy to accomplish for constructs that are unambiguously new module units (ASSUME, LOCAL, etc.) but more difficult for things like a \oplus b == expr.

ahelwer commented 3 years ago

This functionality is now implemented in the easy cases, where a keyword like ASSUME or VARIABLE or ---- or LOCAL or whatever is encountered - it is unambiguously the beginning of a unit definition so the jlist is ended. See the implementation here and tests here (and duplicates in junctlist.txt). Note the interesting case that has to be handled to disambiguate between the -> and ---- tokens here (caught with a test failure). The remaining cases, things like op(x, y) == e and a \oplus b == e, will take more work and are put off for now.

ahelwer commented 3 years ago

Need to be careful since some unit keywords can also occur inside LET/IN constructs. For example, RECURSIVE and INSTANCE.

ahelwer commented 3 years ago

There are possibly some additional keywords that require jlists to be terminated; see https://github.com/tlaplus-community/tree-sitter-tlaplus/issues/24#issuecomment-891305885

ahelwer commented 3 years ago

The state of this issue is that it is basically tractable, but extremely annoying due to all the C++ code that would have to be written in the external scanner to handle it:

This could possibly be accomplished using a notify feature and making DEDENT tokens optional. It is on hold for now, as like https://github.com/tlaplus-community/tree-sitter-tlaplus/issues/9 it basically is only useful to handle the case of users indenting their spec in a really weird way.

ahelwer commented 9 months ago

Basically too much trouble and it's fine because 99% of people start new module-level definitions at the start of the line.