I also took a look at the symbolic operator situation, and it's a little bit more difficult.
Legal characters for these varsyms are determined by membership in unicode categories, which contain about 6000 code points in noncontiguous intervals.
We are parsing varsyms in the scanner, which means we don't have access to the unicode category regex classes that are provided by tree-sitter.
I couldn't find a method to do this in standard C, but maybe someone knows better?
For what it's worth, I tried adding a switch with 6k cases and performance only degraded by about 1%.
I also took a look at the symbolic operator situation, and it's a little bit more difficult. Legal characters for these varsyms are determined by membership in unicode categories, which contain about 6000 code points in noncontiguous intervals.
We are parsing varsyms in the scanner, which means we don't have access to the unicode category regex classes that are provided by tree-sitter. I couldn't find a method to do this in standard C, but maybe someone knows better? For what it's worth, I tried adding a
switch
with 6k cases and performance only degraded by about 1%.Originally posted by @tek in https://github.com/tree-sitter/tree-sitter-haskell/issues/93#issuecomment-1421692482