Closed jeromepin closed 1 year ago
Hey @jeromepin
thanks for the report, its valid - i can reproduces it given your examples too.
I think the problem is that comments may appear anywhere in the language and the parser ignores them in rules. It might be tricky to try to control where comments end up being. I think the issue here is that we define blocks as
block: $ => seq(
$.identifier,
repeat(choice($.string_lit, $.identifier)),
$.block_start,
optional($.body),
$.block_end,
),
so body might be missing.
Is this causing issues for you? It might be problematic when writing some pattern that matches comments inside bodies.
nvim-treesitter matches them jsut as (comment) @comment
.
I'm not sure that this can be fixed as long as we use put comment
into extras
.
Thank you for your fast answer @MichaHoffmann ! :)
My current use-case is about using comments for configuring a linter. I expect such comment to be effective on the next sibling. If I write the following block, I expect the linter to ignore its my-rule
rule for the whole resource.
# ignore-rule:my-rule
resource "foo" "bar" {
baz = "qux"
aaa = "www"
}
while I expect it to ignore the rule only for the baz = "qux"
line in this case :
resource "foo" "bar" {
# ignore-rule:my-rule
baz = "qux"
aaa = "www"
}
With the issue at hand here, it would ignore the rule for the whole body
node, which is counter-intuitive.
Anyway, if you think there is no fix for this, I'll try to find a solution on my end with some kind of hack 🤷♂️
I mean there could be a fix, i just cannot easily see it right now. I mean one could define a grammar that just takes this grammar, wraps all rules in a sequence(optional($.directive), $.original_rule)
to make directives part of the language or something like that.
Honestly I don't have any idea of what you are talking about, I know next to nothing about grammars and parsing. If you can do it (and are willing to) then I'll be happy. Otherwise I think I will put this aside and dive into it one day(tm) when I'll have the time (so probably never tbh)
I'm playing with it right now to get a feeling, ill come back to you in a bit.
The awesome tree-sitter community stepped in and helped out :tada:
I thought about it; if we remove the "body" node we dont have that problem. It is a bit useless anyhow. That is a breaking change though so probably i would maybe create a "v2" directory; not yet sure.
Honestly I don't know. I don't know enough about the HCL grammars and its specificities to have an opinion. I feel like even if the body is empty, it is still an empty body (and not a non-existing body) 🤷 But it's your project, you do what you feel is best 🙂
Hey @MichaHoffmann 👋 I'm wondering if you made a decision regarding this issue and the fix you thought about ? To know what I'm gonna do on my end 🙂
Hey, i had only very little time to work on this. Sorry, no progress yet! Is this causing you active trouble?
No no it's ok take your time :) I'm already glad you are providing this project, I don't want to put any pressure on you. I was simply curious :)
Hey, i had a random idea; we could create a dialect where #ignore-rule:...
is a construct of the language. That way we could control where that kind of directive is produced; its a bit like semgrep does it; see i.e. https://github.com/returntocorp/ocaml-tree-sitter-semgrep/blob/main/lang/semgrep-grammars/src/semgrep-promql/grammar.js where the base grammar is extended with special tokens; that would also give more convenience when consuming the parser. wdyt?
I honestly have no idea 😅 If I understand what you said correctly, it would mean adding some kind of "business logic" (i'm sorry, I'm lacking a better word here) into your grammar ? I'm not sure if I am making myself clear here. It would mean having some code written specifically for my case in a library usable by everyone ? What will you do when the next user come with a request similar to mine ? Implement the same kind of thing for him ?
Nah; i would create a dialect that inherits from the base grammar and adds that directive as a token; if that happens too often eventually i can point people to that as an example how its done i guess.
Since this is pretty stable i could even move it into another repository if you want.
Oh ok I see. It means we would need to define right here and now what would be the token ? Like #ignore-rule
, or #foobar: ignore-rule
or whatever, no ?
Sorry, lots of questions here, I don't have all the design of my software figured out yet and usage will shape its future
I mean this new dialect can evolve while you experiment ( feedback cycle just gets a bit longer ) but since only you are using it it can be changed to whatever suits you
I feel uneasy, I feel like I'm am compelling you to do something out of the scope of this project. To reduce the burden on you after you did the first implementation, maybe I should host it ? Idk
no worries; i can show how its done in a dialect ( that i can remove once you know how its done and have your own repository ); it seems like the best solution to me.
Alright then, if it's suits you, it suits me ! Thank you for taking all this time 👍
ill play with it over the weekend!
Hey, sorry it seems that the comment rule doesnt allow for anything but comments so i cannot selectively say #ignore-rule
is a an actual token. Ill also have to revert the shim
solution we had before since it leads to parse errors in top level json objects that i cannot figure out. Sorry i think i cannot figure this one out!
Oh no ! I'm very sad to hear that ! 😢 Thank you for investing so much of your time in this (even more when the outcome isn't what we expected), I'm very grateful !
yeah, really sorry man!
It's ok, don't beat yourself up. This is how the game is played, sometimes we win, sometimes we lose.
Describe the bug I believe there is an issue with the position of a
comment
node in the tree when the comment is right at the beginning of a block.To Reproduce Steps to reproduce the behavior:
will produce the following tree
``` config_file └── body └── block └── identifier (resource) └── string_lit └── quoted_template_start (") └── template_literal (foo) └── quoted_template_end (") └── string_lit └── quoted_template_start (") └── template_literal (bar) └── quoted_template_end (") └── block_start ({) └── comment (# comment) └── body └── attribute └── identifier (baz) └── expression └── literal_value └── string_lit └── quoted_template_start (") └── template_literal (qux) └── quoted_template_end (") └── attribute └── identifier (aaa) └── expression └── literal_value └── string_lit └── quoted_template_start (") └── template_literal (www) └── quoted_template_end (") └── block_end (}) ```Notice the
comment
node being after theblock_start
node as it should, but before thebody
node, and it seems weird to me.If we move the comment line later in the resource like this :
the `comment` node ends-up at the same level as the siblings `attribute` inside the `body` node.
``` config_file └── body └── block └── identifier (resource) └── string_lit └── quoted_template_start (") └── template_literal (foo) └── quoted_template_end (") └── string_lit └── quoted_template_start (") └── template_literal (bar) └── quoted_template_end (") └── block_start ({) └── body └── attribute └── identifier (baz) └── expression └── literal_value └── string_lit └── quoted_template_start (") └── template_literal (qux) └── quoted_template_end (") └── comment (# comment) └── attribute └── identifier (aaa) └── expression └── literal_value └── string_lit └── quoted_template_start (") └── template_literal (www) └── quoted_template_end (") └── block_end (}) ```Unless am I mistaken ?
The two examples can be reproduced in the playground too.
Expected behavior The
comment
node should be inside thebody
node, as are the attribute, and not right before.Screenshots n/a
Additional context n/a
Thank you very much,