Closed DanielBauman88 closed 1 year ago
This implementation follows the specification; and the specification inserts breaks between control characters.
even though neither control character is a grapheme-cluster as I understand it
No, they are at a Unicode level; they're not "user perceived characters", but they do not become a "part of" nearby user perceived characters either, so they get breaks around them.
There are a ton of different ways one may define user-perceived character, the Unicode spec picks something that gives a reasonable answer for most use cases: I recommend not relying too much on intuition as to what is and isn't a grapheme cluster unless you know the specification.
I see, thanks for the explanation!
Here's a playground example with two unicode control characters.
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=b41c4d54ad849f1c19b6743925ec96f8
The iterator has 2 elements, even though neither control character is a grapheme-cluster as I understand it.
Is the implementation supposed to fall-back to iterating over code points when the code points are not part of a grapheme cluster?