Open Kleidukos opened 10 months ago
I'm not even sure what does the "size of a grapheme cluster" mean.
There are various ways to normalize text (compose/decompose grapheme clusters) https://hackage.haskell.org/package/text-icu-0.8.0.3/docs/Data-Text-ICU-Normalize2.html
Maybe unorm2_composePair() can help to compose those clusters and get their size.
I'm not even sure what does the "size of a grapheme cluster" mean.
It's the operation that gives the length in graphemes, not code points. For example, the length of this grapheme cluster: "🤦🏼♂️" is 1.
This is an interesting problem, there's a short read about it here: https://tonsky.me/blog/unicode/
@Kleidukos In Agda we use cluster counting as linked below, is that what you are looking for?
https://github.com/agda/agda/blob/4c5501e369b63ff3eabdbb3217db59904baf0e78/src/full/Agda/Interaction/Highlighting/LaTeX/Base.hs#L708-L716
length . ICU.breaks (ICU.breakCharacter ICU.Root)
Oh yeah definitely! I'm quite surprised it's not offered by the library directly. Thanks @andreasabel!
I'd like to get the size of a grapheme cluster (from a value of type
Text
). Is there a function in the library that can help me with it? If not, is it in the scope of the library to provide one?