tlaplus-community / tree-sitter-tlaplus

A tree-sitter grammar for TLA⁺ and PlusCal
MIT License
57 stars 10 forks source link

Add missing queries, improve reference highlighting #59

Closed ahelwer closed 2 years ago

ahelwer commented 2 years ago

Adds some missing queries and generalizes some others.

Changes to queries:

  1. Added many missing scopes
  2. Added operator_definition reference definition for pre/in/post-fix operators (ex. a \prec b == ...)
  3. Added parameter reference definition for operators as parameters to operators (ex. f(x, _ \prec _, g(_)) == ...)
  4. Added constant_declaration reference definition for operators (ex. CONSTANTS f(_, _), _\prec_)
  5. Added reference highlighting for bound and unbound pre/in/post/non-fix operators
  6. Added module_definition reference definitions, changed to @include/@definition.import
  7. Added parameter highlighting for unbounded_quantification, choose, assume_prove, pick_proof_step, and take_proof_step
  8. Added definition highlighting for instance, assumption, and theorem

Changes to grammar:

  1. Added symbol field to bound_nonfix_op
  2. Changed identifier in record_value and except rules to be identifier_ref
  3. Merged single_quantifier_bound and quantifier_bound rules
  4. Added missing infix operators to infix_op_symbol rule
  5. Added various fields to proof constructs

@susliko take a look; I found some of these missing from looking at specs like this.

susliko commented 2 years ago

I'll have a look tomorrow 👀

ahelwer commented 2 years ago

Was playing around with scopes as implemented by tree-sitter's highlighting and came up with some questions: https://github.com/tree-sitter/tree-sitter/discussions/1801

Neovim allows you to kick things up to the parent or global scope (see here). I do think things like "find reference in same file or module" is a bit more advanced than "find reference to local or global variables" and requires something more like a stack graph than the simplistic scope/reference model.

ahelwer commented 2 years ago

@will62794 this is also a good opportunity to request addition of any fields you'd like added to the grammar.

I'm also considering changing the record_value definition from:

record_value: $ => prec.left('17-17', seq($._expr, '.', $.identifier)),

to:

record_value: $ => prec.left('17-17', seq($._expr, '.', alias($.identifier, $.identifier_ref))),

for consistency; similarly with the except rule. I think this would be a breaking change for you.

will62794 commented 2 years ago

@ahelwer thanks for letting me know! I did have one question that came up recently that may be related to the above.

For record EXCEPT expressions, it seems that when appearing in their fully general form (i.e. shown on page 322 of Specifying Systems), they are parsed into a structure that doesn't seem to demarcate separate field update specifiers.

For example, the module

------ MODULE test ------
x == [r EXCEPT !.a.b.c = h, !["h"].x.y = 12]
=====

parses into the following (copied from the playground output)

Screen Shot 2022-07-21 at 12 42 44 AM

The children of the except node don't appear to be separated in any way, making it difficult to determine how to interpret this type of expression. Is there perhaps other information I am missing here that can be used to determine the boundaries between the update specifiers and the update values?

ahelwer commented 2 years ago

@will62794 this is a good opportunity for a first PR if you want to make it, the except rule could certainly use some work both in terms of breaking it apart into named sub-rules and adding fields. You can make the changes here.

will62794 commented 2 years ago

@ahelwer Happy to make these changes. I figured it would make sense to post an outline of an approach here first, and if it looks good to you then I can add tests for it and put up a PR.

I think that adding the following additional structure to parsed EXCEPT expressions would generally be sufficient for further evaluation/interpreting/processing, etc.

// .foo["bar"].baz
_except_val: $ => repeat1(
    choice(
    // .foo
    field("record_field", seq('.', alias($.identifier, $.identifier_ref))),
    // ["bar"]
    field("fn_appl", seq('[', commaList1($._expr), ']'))
    )
),

// !.foo["bar"].baz
except_update_specifier: $ => seq('!', $._except_val),

// !.foo["bar"].baz = 4
except_update: $ => seq(
    $.except_update_specifier, 
    '=', 
    $._expr
),

// [f EXCEPT !.foo["bar"].baz = 4, !.bar = 3]
except: $ => seq(
    '[', $._expr, 'EXCEPT',
    commaList1($.except_update),
    ']'
),

e.g. an expression like [a EXCEPT !.a.b["c"] = "b", ![2].x = 5, !.y.z = "c"] would be parsed into:

(except
  (identifier_ref)
  (except_update
    (except_update_specifier
      record_field: (identifier_ref)
      record_field: (identifier_ref)
      fn_appl: (string))
    (string))
  (except_update
    (except_update_specifier
      fn_appl: (nat_number)
      record_field: (identifier_ref))
    (nat_number))
  (except_update
    (except_update_specifier
      record_field: (identifier_ref)
      record_field: (identifier_ref))
    (string))))

Let me know if you have any thoughts on potentially adding more information, changing field names, modifying the parse tree structure, etc.

ahelwer commented 2 years ago

That looks good, I would only make the following changes: