jmacdonald / scribe

Text editor components
MIT License
173 stars 7 forks source link

Add grapheme cluster support #13

Open jmacdonald opened 7 years ago

jmacdonald commented 7 years ago

Scribe's coordinate system is designed as an abstraction over multi-byte "characters", such that a Range spanning one offset corresponds to a single on-screen character, even if it that character is represented by more than a single byte. Currently, that abstraction is naively centered around UTF-8 code points. However, a single on-screen character can be composed of multiple code points, and as a result, working with data that contains such characters breaks much of Scribe's data handling.

A UTF-8 grapheme cluster is what we should be using as the smallest atomic unit of text. The unicode-segmentation crate provides iterators that handle grapheme clusters, rather than code points; let's migrate to that so that the coordinate system supports the full UTF-8 character set.