tree-sitter / tree-sitter-haskell

Haskell grammar for tree-sitter.
MIT License
157 stars 37 forks source link

Maybe include indentation in some node #130

Open guibou opened 1 month ago

guibou commented 1 month ago

This is related to https://github.com/lukas-reineke/indent-blankline.nvim/issues/926

In summary, in indent-blankline, there is a logic which allows to highlight indentation based on "visibility" scope. Actually, you can list a serie of tree sitter nodes which are considered as a "scope" and the indentation logic while highlight the indentation guide for the "scope".

For example, here is how it behaves for do:

image

(See how the indentation line for do is bolder and red. Sorry, this maybe a bit confusing, my indentation guides are indeed very rainbowish).

I tried to apply the same logic for other nodes. The idea is to get a quick view of how long a block extends. I'm thinking about do blocks, where binding, any function, case, ... and maybe actually any multiline expression.

So I've tried to add at least the local_binds node in the list of node to be highlighted (that's for the where and let) and it does not work as expected.

The semantic is unclear for me, but it appears that the indentation level in the different blocs is kinda surprising and inconsistent.

For example, in the where, the local_binds node does not include the where and starts on the first token of the indented lines, without including the indentation, but includes the indentation for the following lines:

image

And hence the scope logic breaks.

I'm unsure about if this should be fixed in this parser or in indent-blankline.nvim, but I'm wondering if most node must not start a bit earlier in the stream of characters. The local_binds should include the where or let. Maybe a new node should be created, so we can have something to match only on all the binding, and something to match on the where or let clause.

Right now, it is kinda impossible to do something including the where keywoard, but not including the complete function above.

If we extend selection starting on one of the binding, b = 2 here:

image

We then get all the bindings, without the where

image

And then we get the complete function:

image

guibou commented 1 month ago

Note that actually, I don't think that we need to create a new node which contains where and the local_binds, but instead, local_binds must contain the let or where. Because if we want to get the child nodes, including or not the indentation, we can still go down into the local_binds and get the boundaries of all the internal nodes.

On the other hand, if we want to have the where + bindings or let + bindings, that's completely impossible for now.

tek commented 1 month ago

I think the trade-off that motivated the current structure is that the tree should be as flat as possible while allowing all interesting constructs to be queried. Of course this is hard to achieve with finite resources, limited feedback and wildly varying use cases 🙂

Given that changing the structure has strong potential to break downstream projects, it seems sensible to do that sparingly.

In this case, using queries rather than the plain hierarchy appears to offer a straight-forward solution:

local source = [[
a = b where
        d = e
        f = g
]]
local query_string = '(("where" @where) ((local_binds) @binds))'
local parser = vim.treesitter.get_string_parser(source, 'haskell')
local tree = parser:parse()[1]
local query = vim.treesitter.query.parse('haskell', query_string)
for _, match, _ in query:iter_matches(tree:root(), source, 0, -1, { all = true }) do
  for id, nodes in pairs(match) do
    local line, indent, _ = nodes[1]:start()
    print('Indent of `' .. query.captures[id] .. '` in line ' .. line .. ': ' .. indent)
  end
end

This prints:

Indent of `where` in line 0: 6
Indent of `binds` in line 1: 10

So if the plugin were to allow the user to specify something like (("where" @scope) ((local_binds) @block)) or (("where" @scope) ((local_binds (_) @element))), the tree would be sufficient.

I don't know if maybe the motivation to only use the hierarchy was motivated by performance considerations, but given that the same mechanism is used for syntax highlighting, it seems appropriate.

Do you think this would be a reasonable solution?


P.S. I'm curious, what would you do with this situation:

a = b where
  x = 1

No scope line or use the beginning of a?

guibou commented 1 month ago

@tek thank you for your detailled answer, and of course, I now understand that adding nodes could lead to performance and breakages, so this is not something to do without strong reflection.

I don't really know about the rational in the implementation of ibl scope logic. Let me digest what you wrote.

guibou commented 1 month ago

For:

a = b where
  x = 1

The scope will start on the where item and highlight everything indented then, so it will highlight x = 1 as expected. Actually, that's what is happening with do, because do nodes includes the do keywoard, which I what I considered NOT consistant with local_binds (e.g. let and where) nodes which does not includes the keyword. But in the context of do that's because it may include the qualified module (from QualifiedDo), hence it make sense that the do keyword is included.

tek commented 1 month ago

The primary, necessitating, reason that both let and do have a wrapping node is that they are part of sum types in the grammar – either expression for let_in and do, and statement for let.

where on the other hand is always the last node in a binding, so it's rather easy to query based on the context.

An additional (where) node probably wouldn't be catastrophically inconvenient, but since where can occur in many places, like module, gadt, class, I probably felt that it would produce more ambiguity than benefit. There are also a few implementation details that make these constructs painful, mostly concerning layout, which may have influenced or distorted my reasoning.