tree-sitter / tree-sitter-haskell

Haskell grammar for tree-sitter.
MIT License
151 stars 36 forks source link

Comments following function included in function pattern #82

Closed rynoV closed 3 months ago

rynoV commented 2 years ago

For functions with a do block, the comments following the function get included in the function, for example:

f = do a

-- | haddock
g = b

here the function pattern will include all of f and the doc comment of g. This isn't the case when there is no do block:

f = a

-- | haddock
g = b

in this case it works as I expected, only matching f = a.

I tested this out using the latest commit on the master branch, using the following tree sitter query:

(function rhs: (_) @function.inside) @function.around

(both captures end up including the doc comment)

rynoV commented 2 years ago

Same thing happens for class and instance patterns, for example:

instance Class Data where
  f = a

-- | haddock
g = a

class Class where
  f :: Data

-- | haddock
g = a
(class) @class.around
(instance (where)? . _ @class.inside) @class.around
tek commented 2 years ago

I'm not sure that it's feasible to implement this, since comments are allowed to break indentation:

f = do
  g

-- foo
  pure 1

so in order to decide whether the comment should terminate the do layout, we'd need to parse the indent of the following line, which would require us to either

and this won't work since we can't store two positions at once :frowning_face:

(in case that is unclear: comments and indent are parsed manually in the C extension)

The only way I can imagine now would be to compromise and use -- | as an indicator, but since that isn't Haskell syntax, but Haddock, it could break valid code. Though it's probably unlikely to occur in an invalid position.

I'll think a bit more about this but I'm fairly pessimistic.

tek commented 2 years ago

@414owen do you have an idea maybe?

414owen commented 2 years ago

I guess I'm unsure why it works without the do block. I would have thought the lexer would only detect the end of f when it sees function g, which would be after the comment.

tek commented 2 years ago

indeed, that's curious

tek commented 2 years ago

ok so in the case without do the function is entirely contained in the range a = b, so tree-sitter is conservative and uses the smallest tree that works, leaving the comment on its own since there's no reason to associate it with any neighboring node more than the others.

for the do case, the layout end is part of the function rhs, so the comment cannot escape that tree.