neovim / tree-sitter-vimdoc

Tree-sitter parser for Vim help files
Apache License 2.0
103 stars 13 forks source link

More structured AST #95

Open lewis6991 opened 1 year ago

lewis6991 commented 1 year ago

In https://github.com/MDeiml/tree-sitter-markdown sections are represented structurally in the AST. This allows things like https://github.com/nvim-treesitter/nvim-treesitter-context to leverage this structure to provide contexts.

Proposal

Make the current column_heading or h1 node the beginning of a block and nest everything under until the next column_heading or h1.

justinmk commented 1 year ago

Would love to do this--and spent a lot of time trying to make it work--but I failed. The problem AFAIR is codeblock termination can happen on any line.

In https://github.com/MDeiml/tree-sitter-markdown sections are represented structurally in the AST

tree-sitter-markdown has a custom scanner.c. Thus far tree-sitter-vimdoc has avoided a custom scanner, which helped a lot with development velocity. Of course, the door is open to exploring that now that things are mostly working.

Ideally tree-sitter itself would introduce a feature that makes things easier for grammars instead of needing a custom scanner. For example https://github.com/tree-sitter/tree-sitter/issues/160 would provide EOF to the grammar instead of making grammars do insane backflips to deal with that.

clason commented 1 year ago

Would things change if we tighten the requirements to always have a terminating < for codeblocks?

But it should be noted that tree-sitter-markdown also tried and failed and in the end had to switch to a two-pass strategy where one parser only parses the block structure, and a second parser does inline parsing of each individual block. (This works but has obvious performance implications.)

justinmk commented 1 year ago

tighten the requirements to always have a terminating < for codeblocks?

Instead of "always", maybe only if the next block is a h1 or column_heading?

So this would be allowed:

foo >
  code
bar >
  code
<

but this would not be allowed:

foo >
  code

=========
h1

This wouldn't result in a perfect AST but might be good enough.