orling / grapheme-splitter

A JavaScript library that breaks strings into their individual user-perceived characters.
MIT License
926 stars 45 forks source link

Heart symbol not processed correctly #32

Open dhoelzl opened 2 years ago

dhoelzl commented 2 years ago

The symbol "\u200D\u2764\uFE0F\u200D" seems to be processed incorrectly. I can string together an endless count of that symbol and it always counts as one grapheme, until the chain is interrupted by another character.

splitter.countGraphemes("x\u200D\u2764\uFE0F\u200Dx\u200D\u2764\uFE0F\u200D\u200D\u2764\uFE0F\u200D\u200D\u2764\uFE0F\u200Dx") === 3

(I would expect 7)

anonghuser commented 11 months ago

the example you've given is not a symbol, it is a symbol surrounded by zero-width-joiner codepoints the specific combinations you build with it may or may not be valid/defined by various specific implementations/unicode versions, but as an abstract concept "stringing together" an endless zero-width-joiner sequence is in fact indicating just one grapheme. that's the whole purpose of the zero-width-joiner.

dhoelzl commented 11 months ago

I don't know how zero-with-joiner exactly work, the only thing I know is that a browser renders this string as

x‍❤️‍x‍❤️‍‍❤️‍‍❤️‍x

where I visually count 7 graphemes.

anonghuser commented 11 months ago

try and select them one by one in the browser. in mine, i can't. i have three parts i can select.

ljharb commented 11 months ago

In mobile Safari, i can select 7 distinct items.

coder0107git commented 9 months ago

On Chrome 114 I can only select x‍❤️‍x‍❤️‍‍❤️‍‍❤️‍x as 3 segments (x‍❤️‍, x‍❤️‍‍❤️‍‍❤️‍, and x)