tlaplus-community / tree-sitter-tlaplus

A tree-sitter grammar for TLA⁺ and PlusCal
MIT License
57 stars 10 forks source link

Possible feature: parse Cartesian product as list #11

Closed ahelwer closed 3 years ago

ahelwer commented 3 years ago

On p284 of Specifying Systems Leslie writes:

In the Cartesian product construct, × (typed as \X or \times) acts somewhat like an associative infix operator with precedence range 10–13. Thus, A×B ⊆ C is interpreted as (A × B) ⊆ C , rather than as A × (B ⊆ C). However, × is part of a special construct, not an infix operator. For example, the three sets A × B × C , (A × B) × C , and A × (B × C) are all different:

  • A × B × C = {ha, b, c i : a ∈ A, b ∈ B, c ∈ C}
  • (A × B) × C = {hha, b i, c i : a ∈ A, b ∈ B, c ∈ C}
  • A × (B × C) = {ha,hb, c ii : a ∈ A, b ∈ B, c ∈ C}

The first is a set of triples; the last two are sets of pairs.

Indeed, this is how SANY parses it. Given:

---- MODULE Test ----
op = A \X (B \X C) \X D
====

SANY outputs:

N_Module          #heirs: 4       kind:   382
| N_BeginModule   #heirs: 3       kind:   333
| | ---- MODULE   #heirs: 0       kind:   3
| | Test          #heirs: 0       kind:   231
| | ----          #heirs: 0       kind:   35
| N_Extends       #heirs: 0       kind:   350
| N_Body          #heirs: 1       kind:   334
| | N_OperatorDefinition        *op       #heirs: 3       kind:   389
| | | N_IdentLHS          #heirs: 1       kind:   366
| | | | op        #heirs: 0       kind:   231
| | | ==          #heirs: 0       kind:   92
| | | N_Times     #heirs: 5       kind:   349
| | | | N_GeneralId       #heirs: 2       kind:   358
| | | | | N_IdPrefix      #heirs: 0       kind:   367
| | | | | A       #heirs: 0       kind:   231
| | | | N_GenInfixOp      #heirs: 2       kind:   359
| | | | | N_IdPrefix      #heirs: 0       kind:   367
| | | | | \X      #heirs: 0       kind:   373
| | | | N_ParenExpr       #heirs: 3       kind:   393
| | | | | (       #heirs: 0       kind:   93
| | | | | N_Times         #heirs: 3       kind:   349
| | | | | | N_GeneralId   #heirs: 2       kind:   358
| | | | | | | N_IdPrefix          #heirs: 0       kind:   367
| | | | | | | B   #heirs: 0       kind:   231
| | | | | | N_GenInfixOp          #heirs: 2       kind:   359
| | | | | | | N_IdPrefix          #heirs: 0       kind:   367
| | | | | | | \X          #heirs: 0       kind:   373
| | | | | | N_GeneralId   #heirs: 2       kind:   358
| | | | | | | N_IdPrefix          #heirs: 0       kind:   367
| | | | | | | C   #heirs: 0       kind:   231
| | | | | )       #heirs: 0       kind:   94
| | | | N_GenInfixOp      #heirs: 2       kind:   359
| | | | | N_IdPrefix      #heirs: 0       kind:   367
| | | | | \X      #heirs: 0       kind:   373
| | | | N_GeneralId       #heirs: 2       kind:   358
| | | | | N_IdPrefix      #heirs: 0       kind:   367
| | | | | D       #heirs: 0       kind:   231
| N_EndModule     #heirs: 1       kind:   345
| | ====          #heirs: 0       kind:   36

This can be compared with:

---- MODULE Test ----
op = A + (B + C) + D
====

which SANY parses as ((A + (B + C)) + D). The tree-sitter parser currently parses Cartesian products the same as any other left-associative infix operator, not how SANY parses it: as a list. The question thus arises of whether the tree-sitter parser should be modified to parse Cartesian products as lists.

It is uncertain what value is provided by parsing Cartesian products as list. Perhaps it is easier for a TLA+ interpreter to handle Cartesian products when the parse tree is presented in this way. However, the tree-sitter grammar is not intended to function as the backbone of a TLA+ interpreter. Moreover, even if the grammar were to be used by an interpreter, it's unclear what value would be brought by parsing Cartesian products as a list. One can easily imagine an interpreter traversing the parse tree and constructing the resulting tuple regardless of whether additional products are represented as children or siblings. Parentheses are exposed as a child node type, so the interpreter could handle the case of A \X (B \X C) \X D just fine.

Outside of interpretation, there is also not clear value in other areas such as highlighting or code folding. Thus the problem is currently not planned to be fixed, although this issue will remain open for documentation & discussion in case someone has an idea for how parsing Cartesian products as lists could be useful.

ahelwer commented 3 years ago

Some prior experimentation: https://stackoverflow.com/q/66783636/2852699