odin-lang / Odin

Odin Programming Language
https://odin-lang.org
BSD 3-Clause "New" or "Revised" License
6.12k stars 550 forks source link

Add grapheme analysis facilities to `core:unicode` #3775

Closed Feoramund closed 2 weeks ago

Feoramund commented 2 weeks ago

This PR implements procedures for the purpose of text segmentation by Grapheme Cluster Boundary as described in Unicode® Standard Annex #29.

The Unicode Consortium provides 3,648 individual test cases for confirming an algorithm is compliant with the specification, as well as 1,187+ test cases for emoji combinations. (The single codepoint emoji test cases have been excluded, as they are not relevant to GCB calculation.)

4,835 test cases in total have been included in this PR.

Without display device information, this algorithm is as close as one gets to calculating the visual width of a Unicode string, assuming the program and its font face implement every combination that a user would expect to enter or see.