tree-sitter / tree-sitter-julia

Julia grammar for Tree-sitter
MIT License
93 stars 31 forks source link

Identifier starting with ∫ incorrectly parsed (?) #101

Closed jagot closed 1 year ago

jagot commented 1 year ago

I hope this is the right place to report this.

I noticed Emacs incorrectly highlighted a code snippet of mine, where some identifiers start with :

image

I can reduce this to

∫A
∑a

which treesit-explore-mode report as

(source_file
 (ERROR (ERROR))
 (identifier) \n (identifier) \n)

so identifiers starting with are correctly identified, but not those starting with .

Thank you for a very nice parser otherwise!

savq commented 1 year ago

Both and are in the unicode character category Sm (math symbols). Most of the characters in this category are operators, so the ones that work as identifiers are "hand-picked". The valid characters are listed in julia/src/flisp/julia_extensions.c.

I never added the whole list to the tree-sitter parser because 1. some are very confusing (like the 7 versions of nabla), and 2. very few of these characters are actually used in the wild.

jagot commented 1 year ago

I see your point, but since I'm a sucker for mathematical notation in my code, I would not mind seeing the whole list included :)

clason commented 1 year ago

Yes, but there's a definite cost associated to this, since adding symbols increases the state size of the parser -- and this parser is already on the very top end of the largest (and slowest) parsers nvim-treesitter supports.

savq commented 1 year ago

The current julia_extensions.c doesn't make it very obvious, but the characters we already allow are all in two character ranges ∀-∇, ∎-∑, which is good because it means we can stuff more characters without computing more comparisons. If we add a third range for integrals ∫-∳ I think we're all good.