Closed jiwandono closed 1 year ago
On python with https://pypi.org/project/grapheme/
>>> grapheme.length('\r\n\U0000FE0E')
2
>>> grapheme.length('\n\U0000FE0E')
2
Good catch! One of the transitions in the state machine was not correct, which led to the rule GB9 to be preferred over GB4. This should be fixed now.
It's interesting that the official Unicode test cases do not include this combination. (It's not a typical string found in the wild but they should still include this one.)
Hello!
I observed the following difference with the mentioned sequences. To be honest I'm not sure which one is correct, but could you help to confirm if the result is expected with uniseg library?
Thank you!
--
Golang with
uniseg.GraphemeClusterCount
Output:
https://goplay.tools/snippet/WBIJQfKZs7g
PHP 8.0.28 with
grapheme_strlen
Output:
https://onlinephp.io/c/2cb86