tree-sitter-grammars / tree-sitter-hcl

HCL grammar for tree-sitter
https://tree-sitter-grammars.github.io/tree-sitter-hcl/
Apache License 2.0
92 stars 20 forks source link

A comment at the beginning of a resource ends-up outside of the `body` block instead of inside #30

Closed jeromepin closed 1 year ago

jeromepin commented 1 year ago

Describe the bug I believe there is an issue with the position of a comment node in the tree when the comment is right at the beginning of a block.

To Reproduce Steps to reproduce the behavior:

resource "foo" "bar" {
  # comment
  baz = "qux"
  aaa = "www"
}
will produce the following tree ``` config_file └── body └── block └── identifier (resource) └── string_lit └── quoted_template_start (") └── template_literal (foo) └── quoted_template_end (") └── string_lit └── quoted_template_start (") └── template_literal (bar) └── quoted_template_end (") └── block_start ({) └── comment (# comment) └── body └── attribute └── identifier (baz) └── expression └── literal_value └── string_lit └── quoted_template_start (") └── template_literal (qux) └── quoted_template_end (") └── attribute └── identifier (aaa) └── expression └── literal_value └── string_lit └── quoted_template_start (") └── template_literal (www) └── quoted_template_end (") └── block_end (}) ```

Notice the comment node being after the block_start node as it should, but before the body node, and it seems weird to me.

If we move the comment line later in the resource like this :

resource "foo" "bar" {
  baz = "qux"
  # comment
  aaa = "www"
}
the `comment` node ends-up at the same level as the siblings `attribute` inside the `body` node. ``` config_file └── body └── block └── identifier (resource) └── string_lit └── quoted_template_start (") └── template_literal (foo) └── quoted_template_end (") └── string_lit └── quoted_template_start (") └── template_literal (bar) └── quoted_template_end (") └── block_start ({) └── body └── attribute └── identifier (baz) └── expression └── literal_value └── string_lit └── quoted_template_start (") └── template_literal (qux) └── quoted_template_end (") └── comment (# comment) └── attribute └── identifier (aaa) └── expression └── literal_value └── string_lit └── quoted_template_start (") └── template_literal (www) └── quoted_template_end (") └── block_end (}) ```

Unless am I mistaken ?

The two examples can be reproduced in the playground too.

Expected behavior The comment node should be inside the body node, as are the attribute, and not right before.

Screenshots n/a

Additional context n/a

Thank you very much,

MichaHoffmann commented 1 year ago

Hey @jeromepin

thanks for the report, its valid - i can reproduces it given your examples too.

I think the problem is that comments may appear anywhere in the language and the parser ignores them in rules. It might be tricky to try to control where comments end up being. I think the issue here is that we define blocks as

      block: $ => seq(
        $.identifier,
        repeat(choice($.string_lit, $.identifier)),
        $.block_start,
        optional($.body),
        $.block_end,
      ),

so body might be missing.

Is this causing issues for you? It might be problematic when writing some pattern that matches comments inside bodies. nvim-treesitter matches them jsut as (comment) @comment.

MichaHoffmann commented 1 year ago

I'm not sure that this can be fixed as long as we use put comment into extras.

jeromepin commented 1 year ago

Thank you for your fast answer @MichaHoffmann ! :)

My current use-case is about using comments for configuring a linter. I expect such comment to be effective on the next sibling. If I write the following block, I expect the linter to ignore its my-rule rule for the whole resource.

# ignore-rule:my-rule
resource "foo" "bar" {
  baz = "qux"
  aaa = "www"
}

while I expect it to ignore the rule only for the baz = "qux" line in this case :

resource "foo" "bar" {
  # ignore-rule:my-rule
  baz = "qux"
  aaa = "www"
}

With the issue at hand here, it would ignore the rule for the whole body node, which is counter-intuitive.

Anyway, if you think there is no fix for this, I'll try to find a solution on my end with some kind of hack 🤷‍♂️

MichaHoffmann commented 1 year ago

I mean there could be a fix, i just cannot easily see it right now. I mean one could define a grammar that just takes this grammar, wraps all rules in a sequence(optional($.directive), $.original_rule) to make directives part of the language or something like that.

jeromepin commented 1 year ago

Honestly I don't have any idea of what you are talking about, I know next to nothing about grammars and parsing. If you can do it (and are willing to) then I'll be happy. Otherwise I think I will put this aside and dive into it one day(tm) when I'll have the time (so probably never tbh)

MichaHoffmann commented 1 year ago

I'm playing with it right now to get a feeling, ill come back to you in a bit.

MichaHoffmann commented 1 year ago

The awesome tree-sitter community stepped in and helped out :tada:

MichaHoffmann commented 1 year ago

I thought about it; if we remove the "body" node we dont have that problem. It is a bit useless anyhow. That is a breaking change though so probably i would maybe create a "v2" directory; not yet sure.

jeromepin commented 1 year ago

Honestly I don't know. I don't know enough about the HCL grammars and its specificities to have an opinion. I feel like even if the body is empty, it is still an empty body (and not a non-existing body) 🤷 But it's your project, you do what you feel is best 🙂

jeromepin commented 1 year ago

Hey @MichaHoffmann 👋 I'm wondering if you made a decision regarding this issue and the fix you thought about ? To know what I'm gonna do on my end 🙂

MichaHoffmann commented 1 year ago

Hey, i had only very little time to work on this. Sorry, no progress yet! Is this causing you active trouble?

jeromepin commented 1 year ago

No no it's ok take your time :) I'm already glad you are providing this project, I don't want to put any pressure on you. I was simply curious :)

MichaHoffmann commented 1 year ago

Hey, i had a random idea; we could create a dialect where #ignore-rule:... is a construct of the language. That way we could control where that kind of directive is produced; its a bit like semgrep does it; see i.e. https://github.com/returntocorp/ocaml-tree-sitter-semgrep/blob/main/lang/semgrep-grammars/src/semgrep-promql/grammar.js where the base grammar is extended with special tokens; that would also give more convenience when consuming the parser. wdyt?

jeromepin commented 1 year ago

I honestly have no idea 😅 If I understand what you said correctly, it would mean adding some kind of "business logic" (i'm sorry, I'm lacking a better word here) into your grammar ? I'm not sure if I am making myself clear here. It would mean having some code written specifically for my case in a library usable by everyone ? What will you do when the next user come with a request similar to mine ? Implement the same kind of thing for him ?

MichaHoffmann commented 1 year ago

Nah; i would create a dialect that inherits from the base grammar and adds that directive as a token; if that happens too often eventually i can point people to that as an example how its done i guess.

Since this is pretty stable i could even move it into another repository if you want.

jeromepin commented 1 year ago

Oh ok I see. It means we would need to define right here and now what would be the token ? Like #ignore-rule, or #foobar: ignore-rule or whatever, no ? Sorry, lots of questions here, I don't have all the design of my software figured out yet and usage will shape its future

MichaHoffmann commented 1 year ago

I mean this new dialect can evolve while you experiment ( feedback cycle just gets a bit longer ) but since only you are using it it can be changed to whatever suits you

jeromepin commented 1 year ago

I feel uneasy, I feel like I'm am compelling you to do something out of the scope of this project. To reduce the burden on you after you did the first implementation, maybe I should host it ? Idk

MichaHoffmann commented 1 year ago

no worries; i can show how its done in a dialect ( that i can remove once you know how its done and have your own repository ); it seems like the best solution to me.

jeromepin commented 1 year ago

Alright then, if it's suits you, it suits me ! Thank you for taking all this time 👍

MichaHoffmann commented 1 year ago

ill play with it over the weekend!

MichaHoffmann commented 1 year ago

Hey, sorry it seems that the comment rule doesnt allow for anything but comments so i cannot selectively say #ignore-rule is a an actual token. Ill also have to revert the shim solution we had before since it leads to parse errors in top level json objects that i cannot figure out. Sorry i think i cannot figure this one out!

jeromepin commented 1 year ago

Oh no ! I'm very sad to hear that ! 😢 Thank you for investing so much of your time in this (even more when the outcome isn't what we expected), I'm very grateful !

MichaHoffmann commented 1 year ago

yeah, really sorry man!

jeromepin commented 1 year ago

It's ok, don't beat yourself up. This is how the game is played, sometimes we win, sometimes we lose.