zeux / qgrep

Fast regular expression grep for source code with incremental index updates
MIT License
332 stars 43 forks source link

Fix casefold table #22

Closed MRWITEK closed 2 years ago

MRWITEK commented 2 years ago

Apparently, kCaseFoldASCII[] was originally generated with tolower() standard function. It uses current locale, which seems to have used CP1252 encoding. The resulting table made casefold() mangle non-ASCII UTF-8 strings, which caused re2 to fail with "invalid UTF-8" error.

This commit limits casefolding to A-Z and a-z ASCII ranges, same as vectorised version.