stsewd / tree-sitter-comment

Tree-sitter grammar for comment tags like TODO, FIXME(user).
https://stsewd.dev/tree-sitter-comment/
MIT License
134 stars 9 forks source link

V2: Don't expose `text` nodes #33

Open stsewd opened 9 months ago

stsewd commented 9 months ago

It's very common for users to want to match TODO in addition to TODO: (ending in colon). To match that, users needed to match over all text nodes, since every word is a text node, that was slow. So the main improvement here is to not expose text nodes, and instead expose any uppercase word as a node. Unless you write all of your comments very angry in SCREAMING case, performance should improve.

Changes

While this new approach makes the parser less flexible without the text nodes, it will make it faster, or at least faster to match, since the parsing itself is/was already fast (doing a match over text nodes is the slow operation). Another approach I was thinking is to parse the keywords directly, but that makes it even less flexible, since any new keywords will need to be added in the parser instead of a query.

Other notes

I tried moving everything to JS in https://github.com/stsewd/tree-sitter-comment/commit/f2318ba93dca031e57fec92caa60766d9219f4e5, but it was slow, due to the use of conflicts.

Original comment:

Moving everything to JS means losing a little bit of control over the exact matching of the tag name, resulting in matches for:

I think it's fine for those to match, don't think you see much of those in the wild, if they are a problem, we can try adjusting the rules, or go back to have that small logic in the external scanner.

UPDATE: Did a benchmark, it's SLOWWWW, from 36 ms to 1264 ms, this is parsing a file with 10K lines full of comments. This is probably due to the conflicts that tree-sitter needs to resolve...