odin-lang / Odin

Odin Programming Language
https://odin-lang.org
BSD 3-Clause "New" or "Revised" License
6.1k stars 550 forks source link

Measure `East_Asian_Width` during grapheme decoding #3789

Closed Feoramund closed 1 week ago

Feoramund commented 1 week ago

This is an implementation of Standard Annex 11, the last piece of the puzzle to measuring a Unicode grapheme's width for the purposes of monospace alignment. I did extensive testing with different terminals and scripts. With this added to the grapheme counting proc, I found coverage to be accurate for the following scripts:

Unfortunately, terminal emulator support for Indic scripts looks to be poor. The width at which different graphemes were rendered varied, and I could only get two terminals to align sample texts of Devanagari, Gujarati, and Tamil with a hack not described in the standard. I decided not to commit the hack in hopes of a future document or information that might address this.

Emoji support is also unreliable for the same reason that rendering is up to the terminal, and there are a myriad of multi-codepoint combinations that may or may not be supported on the emulator in question, which when unsupported, tend to degenerate to their component glyphs instead of showing as one unified glyph.

That said, I believe this is a more than adequate implementation of measuring character width as it's often called, with coverage for the majority of foreseeable cases.