tree-sitter / tree-sitter-haskell

Haskell grammar for tree-sitter.
MIT License
151 stars 36 forks source link

explicitly inline hidden rule #96

Closed brandonspark closed 1 year ago

brandonspark commented 1 year ago

Hi there,

I’m a developer for https://github.com/returntocorp/semgrep, an open-source code analysis tool, and I’d like to add Haskell as a supported language!

We have a step in our https://github.com/returntocorp/ocaml-tree-sitter-semgrep tool to convert a tree-sitter grammar to an OCaml parser. A step of this consists of “simplifying” the grammar, in which we unhide all of the previously-hidden rules in the grammar. I tried to simplify the Haskell grammar, while I was adding Haskell support, but this ended up causing a parsing conflict in the resulting grammar, notably:

haskell: Importing initial ‘grammar.json’.
haskell: Simplifying ‘grammar.json’ for ocaml-tree-sitter.
haskell: Recovering informational JS grammars from JSON.
haskell: Generating definitive ‘parser.c’.
Unresolved conflict for symbol sequence:

  qualifying_module_repeat1  •  conid  …

Possible interpretations:

  1:  (qualifying_module  qualifying_module_repeat1)  •  conid  …
  2:  (qualifying_module_repeat1  qualifying_module_repeat1  •  qualifying_module_repeat1)

Possible resolutions:

  1:  Specify a left or right associativity in `qualifying_module`
  2:  Add a conflict for these rules: `qualifying_module`

The reason I bring this up is because the Haskell grammar seems to be relying on some “auto-inlining” behavior for hidden rules. Independently of our ocaml-tree-sitter-semgrep tool, I went to the module.js and util.js grammars for this language, and un-hid the _qualifying_module rule (changing it to qualifying_module in two places).

This ends up causing a conflict:

brandonspark@MacBook-Pro ~/o/l/s/s/tree-sitter-haskell ((aee3725d…))> tree-sitter generate

Unresolved conflict for symbol sequence:

  qualifying_module_repeat1  •  ‘_conid’  …

Possible interpretations:

  1:  (qualifying_module  qualifying_module_repeat1)  •  ‘_conid’  …
  2:  (qualifying_module_repeat1  qualifying_module_repeat1  •  qualifying_module_repeat1)

Possible resolutions:

  1:  Specify a left or right associativity in `qualifying_module`
  2:  Add a conflict for these rules: `qualifying_module`

While the tree-sitter documentation claims that hiding a rule simply hides it from the parse tree, it seems it may be doing a bit more than that, and there is some hidden behavior that the grammar is currently depending on, which is triggered by hiding the rule, without which the grammar will not generate a parser.

It seems that hiding a rule also inlines it, because inlining the newly-unhidden qualifying_module rule:

diff --git a/grammar.js b/grammar.js
index ec027c0..fa74883 100644
--- a/grammar.js
+++ b/grammar.js
@@ -85,6 +85,7 @@ module.exports = grammar({
     $._quantifiers,
     $._tyfam_pat_prefix,
     $._tyfam_pat_infix,
+    $.qualifying_module
   ],

   precedences: _ => [

causes the parsing conflict to go away.

Would you mind adding this change in, to explicitly inline the _qualifying_module rule? It seems like there's a hidden action being taken by the rule-hiding behavior which is equivalent to inlining. This would unblock us, and allow us to properly use our ocaml-tree-sitter-semgrep tool, and would be much appreciated.

The same thing happened with the Swift grammar some time ago, too: https://github.com/returntocorp/ocaml-tree-sitter-semgrep/issues/286

Thanks!

Test plan:

tree-sitter test succeeds

brandonspark commented 1 year ago

cc @mjambon

tek commented 1 year ago

don't see why not!

mjambon commented 1 year ago

@brandonspark it seems you may have run into this: https://github.com/tree-sitter/tree-sitter/issues/1683