tree-sitter-grammars / tree-sitter-markdown

Markdown grammar for tree-sitter
MIT License
379 stars 45 forks source link

Support tag convention #29

Closed danymat closed 1 year ago

danymat commented 2 years ago

Hello! Thanks for this great parser, it's helping me doing some great things on parsing my notes

I currently use tags derived from obsidian (and other notes management tools who uses tags) and want to see if it's possible for them to be treated as tags in TS: The syntax is: #word word can be any letter or digit

Références:

https://help.obsidian.md/How+to/Working+with+tags

MDeiml commented 2 years ago

Seems easy enough to implement. I don't think it should be part of the standard markdown grammar. I also haven't found a good way to make the grammar more configurable. I'm probably gonna do something about this in the future.

In the meantime if you want I can create a branch where I implement this.

MDeiml commented 2 years ago

I just created a tags branch that includes the tags. Hope that helps

ghost commented 2 years ago

Thank you for the tags branch! Since the branch is quite behind, I would like to merge it in a fork of the default branch (I'm assuming that's that TSInstall cloned for neovim). If it works and there is value in merging it to the upstream default branch I'm happy to submit a PR.

I'm completely new to tree-parser so I had some questions about the process:

MDeiml commented 2 years ago

The only files you should need to change are the grammar.js and the scanner.cc file. All the other ones are generated by the tree-sitter-cli (which you need to install). Specifically you have to change the inline grammar, so the files in the tree-sitter-markdown-inline folder. Then run tree-sitter generate in the same folder and tree-sitter test for good measure.

maubuz commented 2 years ago

I believe was able to port your old tags branch to the split_parser branch by simply adding your regex for the tag rule to grammar.js with one minor modification: tag: $ => / #([a-zA-Z_\-\/][a-zA-Z_\-\/0-9]*|[0-9]+[a-zA-Z_\-\/][a-zA-Z_\-\/]*)/, (there is a space character before the #) My solution is in my branch merge_tags.

Is changing scanner.cc necessary? The Tree-sitter documentation mentions scanner.cc being used for the externals section of grammar.js. Since I did not modify externals, I did not change the external scanner.

Once I added (tag) @text.reference to nvim-treesitter/queries/markdown_inline/highlights.scm, Tree-sitter neovim playground seemed to correctly show the tags as nodes (there is a weird line offset with the tag highlight but I don't think it's relevant).

image

tree-sitter test seems to yield the same passes/fails as split_parser. Am I on the right track? Is it worth submitting a PR?

MDeiml commented 2 years ago

Oh yeah you are right. This should not have to touch scanner.cc. Looks great!

I'd be happy to accept a PR. If you want to put in the work, the best way to integrate this into the parser is by making this node and optional part of the grammar. That way there is only one branch and everybody can choose which nodes they want in their markdown. You can look at the beginning of common/grammar.js where the EXTENSION_... constants are defined to see how I usually do this. There are some examples of making rules optional using this in the other js files.

To compile with an extension you need add the env var, e.g. EXTENSION_TAGS=1 tree-sitter generate.

ghost commented 1 year ago

Just to follow up, I'll would like to try the PR with the optional functionality. It's been busy on my side but in a few weeks I should have more free time. Cheers.

tott commented 1 year ago

I just freshly installed this module today via TSInstall but sadly the tags don't work for me yet. What would I need to do in order to benefit from this addition?

MDeiml commented 1 year ago

As of now tags still aren't implemented, sorry. I'll see if I get to it soon.

MDeiml commented 1 year ago

This works now, but it's hidden behind an environment variable, as I think this syntax is not that widely used and some users would probably not want it. To compile this parser with tags enabled, clone this repo (make sure it clones the "split_parser" branch) and then run

EXTENSION_TAGS=1 npm run build

inside the repo. You will probably have to configure whatever editor you are using to use the local build you made. I can help with that if you want.

Alternatively I'll push a branch with tags enabled by default in a second, if you want you can use that instead.

MDeiml commented 1 year ago

You can use the branch tag-convention, it has tags enabled by default.

ghost commented 1 year ago

Thank you very much for adding this in. I apologize but I did not have the time to revisit this and learn how to add optional flags.

If I remember correctly, to actually "see" the tags highlighted differently, in my case I had to assign the tag token to an existing group or different color scheme in tree-sitter's configuration.

Tree-sitter playground can help see what syntax highlighting is being applied to the tag token.

On Sun, Jan 29, 2023, 8:11 a.m. Matthias Deiml @.***> wrote:

You can use the branch tag-convention https://github.com/MDeiml/tree-sitter-markdown/tree/tag-convention, it has tags enabled by default.

— Reply to this email directly, view it on GitHub https://github.com/MDeiml/tree-sitter-markdown/issues/29#issuecomment-1407660072, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHL255SYVDEKCKIJ6OPUVILWUZUB5ANCNFSM5MSFSJMQ . You are receiving this because you commented.Message ID: @.***>

MDeiml commented 1 year ago

That's true. For the new syntax to be visible you will have to change the corresponding tree-sitter queries for the editor you are using. If anyone needs help with that just ask here.