Scribe's coordinate system is designed as an abstraction over multi-byte "characters", such that a Range spanning one offset corresponds to a single on-screen character, even if it that character is represented by more than a single byte. Currently, that abstraction is naively centered around UTF-8 code points. However, a single on-screen character can be composed of multiple code points, and as a result, working with data that contains such characters breaks much of Scribe's data handling.
A UTF-8 grapheme cluster is what we should be using as the smallest atomic unit of text. The unicode-segmentation crate provides iterators that handle grapheme clusters, rather than code points; let's migrate to that so that the coordinate system supports the full UTF-8 character set.
Scribe's coordinate system is designed as an abstraction over multi-byte "characters", such that a
Range
spanning oneoffset
corresponds to a single on-screen character, even if it that character is represented by more than a single byte. Currently, that abstraction is naively centered around UTF-8 code points. However, a single on-screen character can be composed of multiple code points, and as a result, working with data that contains such characters breaks much of Scribe's data handling.A UTF-8 grapheme cluster is what we should be using as the smallest atomic unit of text. The
unicode-segmentation
crate provides iterators that handle grapheme clusters, rather than code points; let's migrate to that so that the coordinate system supports the full UTF-8 character set.