Open dhoelzl opened 2 years ago
the example you've given is not a symbol, it is a symbol surrounded by zero-width-joiner codepoints the specific combinations you build with it may or may not be valid/defined by various specific implementations/unicode versions, but as an abstract concept "stringing together" an endless zero-width-joiner sequence is in fact indicating just one grapheme. that's the whole purpose of the zero-width-joiner.
I don't know how zero-with-joiner exactly work, the only thing I know is that a browser renders this string as
x❤️x❤️❤️❤️x
where I visually count 7 graphemes.
try and select them one by one in the browser. in mine, i can't. i have three parts i can select.
In mobile Safari, i can select 7 distinct items.
On Chrome 114 I can only select x❤️x❤️❤️❤️x
as 3 segments (x❤️
, x❤️❤️❤️
, and x
)
The symbol "\u200D\u2764\uFE0F\u200D" seems to be processed incorrectly. I can string together an endless count of that symbol and it always counts as one grapheme, until the chain is interrupted by another character.
splitter.countGraphemes("x\u200D\u2764\uFE0F\u200Dx\u200D\u2764\uFE0F\u200D\u200D\u2764\uFE0F\u200D\u200D\u2764\uFE0F\u200Dx") === 3
(I would expect 7)