Open Laurent45 opened 9 months ago
You are correct that if the second test (grapheme cluster boundary) is true the first one (UTF-8 char boundary) will always be true.
The first check is local and only needs to check the byte at the index (or length), while the follow-on second check requires us to traverse the string. Without benchmarking with real data (that is most likely valid), there could be a questionable argument about short-circuiting saving us the work for the second check. Given that we expect valid data anyways, we could probably elide the first check.
A little suggestion. It seems like first check and second check do the same check implicitly. Is there a case where the first check is false and the second true ? What do you think ?