Apparently, kCaseFoldASCII[] was originally generated with tolower()
standard function. It uses current locale, which seems to have used
CP1252 encoding. The resulting table made casefold() mangle non-ASCII
UTF-8 strings, which caused re2 to fail with "invalid UTF-8" error.
This commit limits casefolding to A-Z and a-z ASCII ranges, same as
vectorised version.
Apparently,
kCaseFoldASCII[]
was originally generated withtolower()
standard function. It uses current locale, which seems to have used CP1252 encoding. The resulting table madecasefold()
mangle non-ASCII UTF-8 strings, which caused re2 to fail with "invalid UTF-8" error.This commit limits casefolding to A-Z and a-z ASCII ranges, same as vectorised version.