tree-sitter / tree-sitter-julia

Julia grammar for Tree-sitter
MIT License
91 stars 31 forks source link

Question: expand `for` and `struct` grammar with headers? #142

Open simonmandlik opened 3 weeks ago

simonmandlik commented 3 weeks ago

Did you check existing issues?

Tree-Sitter CLI Version, if relevant (output of tree-sitter --version)

No response

Describe the bug

This is not a bug, but more like a question/feature request.

I'm trying to update / fix julia queries for neovim and I'm having a very hard time with for loops and also structs.

This python example

for a in range(10):
    pass

is parsed as follows:

(module ; [0, 0] - [2, 0]
  (for_statement ; [0, 0] - [1, 8]
    left: (identifier) ; [0, 4] - [0, 5]
    right: (call ; [0, 9] - [0, 18]
      function: (identifier) ; [0, 9] - [0, 14]
      arguments: (argument_list ; [0, 14] - [0, 18]
        (integer))) ; [0, 15] - [0, 17]
    body: (block ; [1, 4] - [1, 8]
      (pass_statement)))) ; [1, 4] - [1, 8]

and this julia example

for a in 1:10, b in 1:10
    print(a)
end

is parsed as

(source_file ; [0, 0] - [3, 0]
  (for_statement ; [0, 0] - [2, 3]
    (for_binding ; [0, 4] - [0, 13]
      (identifier) ; [0, 4] - [0, 5]
      (range_expression ; [0, 9] - [0, 13]
        (integer_literal) ; [0, 9] - [0, 10]
        (integer_literal))) ; [0, 11] - [0, 13]
    (for_binding ; [0, 15] - [0, 24]
      (identifier) ; [0, 15] - [0, 16]
      (range_expression ; [0, 20] - [0, 24]
        (integer_literal) ; [0, 20] - [0, 21]
        (integer_literal))) ; [0, 22] - [0, 24]
    (call_expression ; [1, 4] - [1, 12]
      (identifier) ; [1, 4] - [1, 9]
      (argument_list ; [1, 9] - [1, 12]
        (identifier))))) ; [1, 10] - [1, 11]

Because the two for_binding nodes are not grouped together in any way and are siblings of the call_expression, I couldn't write any query that would correctly select the loop "header" (regardless of the number of variables iterated over), and neither any query that would select the body without the "header". This might be due to the fact that I'm no expert in TS queries, but for Python such queries are really simple.

Similar situation is with struct definitions:

struct A{B, C} <: D
    x
    y
end

is parsed as

(source_file ; [0, 0] - [4, 0]
  (struct_definition ; [0, 0] - [3, 3]
    name: (identifier) ; [0, 7] - [0, 8]
    (type_parameter_list ; [0, 8] - [0, 14]
      (identifier) ; [0, 9] - [0, 10]
      (identifier)) ; [0, 12] - [0, 13]
    (type_clause ; [0, 15] - [0, 19]
      (operator) ; [0, 15] - [0, 17]
      (identifier)) ; [0, 18] - [0, 19]
    (identifier) ; [1, 4] - [1, 5]
    (identifier))) ; [2, 4] - [2, 5]

Again, struct header nodes type_parameter_list and type_clause are siblings of the struct body.

Is there a reason not to group struct and loop "headers" together similarly to how python is parsed?

simonmandlik commented 3 weeks ago

Ifs in python also provide consequence child:

if True:
    pass
elif False:
    pass
else:
    pass
(module ; [0, 0] - [6, 0]
  (if_statement ; [0, 0] - [5, 8]
    condition: (true) ; [0, 3] - [0, 7]
    consequence: (block ; [1, 4] - [1, 8]
      (pass_statement)) ; [1, 4] - [1, 8]
    alternative: (elif_clause ; [2, 0] - [3, 8]
      condition: (false) ; [2, 5] - [2, 10]
      consequence: (block ; [3, 4] - [3, 8]
        (pass_statement))) ; [3, 4] - [3, 8]
    alternative: (else_clause ; [4, 0] - [5, 8]
      body: (block ; [5, 4] - [5, 8]
        (pass_statement))))) ; [5, 4] - [5, 8]

whereas in julia all "consequence" lines are siblings of the condition:

if true
    1
    1
elseif false
    1
else
    1
end
(source_file ; [0, 0] - [8, 0]
  (if_statement ; [0, 0] - [7, 3]
    condition: (boolean_literal) ; [0, 3] - [0, 7]
    (integer_literal) ; [1, 4] - [1, 5]
    (integer_literal) ; [2, 4] - [2, 5]
    alternative: (elseif_clause ; [3, 0] - [5, 0]
      condition: (boolean_literal) ; [3, 7] - [3, 12]
      (integer_literal)) ; [4, 4] - [4, 5]
    alternative: (else_clause ; [5, 0] - [7, 0]
      (integer_literal)))) ; [6, 4] - [6, 5]
savq commented 3 weeks ago

There's two seperate issues here, so I'll address them separately.

Querying inner blocks

The block rule used in the grammar is not visible (see #73). There's no technical limitation here, but making it visible is a breaking change that would require updating almost all tests.

Querying "headers"

If blocks were visible, querying headers would be really simple, since they're always "the thing before the block".

For now, I can only think of a couple of workarounds:

In the case of structs... The way they're currently parsed is awful. I took a much simpler approach for the lezer-julia grammar, and that should probably get ported here.

simonmandlik commented 3 weeks ago

@savq thanks for the reply!

I prepared a PR https://github.com/nvim-treesitter/nvim-treesitter-textobjects/pull/639, any comments would be greatly appreciated!

The block rule used in the grammar is not visible (see https://github.com/tree-sitter/tree-sitter-julia/issues/73). There's no technical limitation here, but making it visible is a breaking change that would require updating almost all tests.

Yes, this would really help a lot. For ifs, conditions are easy for example as they are under the condition field, but selecting blocks is more difficult (and would have to rely on the matching algorithm, as elseif is for example a sibling of all nodes in the block)

(for_statement ((for_binding) ("," (for_binding))*) @bindings)

I tested this and it selects only one for_binding at a time, not all of them