This is an implementation of Standard Annex 11, the last piece of the puzzle to measuring a Unicode grapheme's width for the purposes of monospace alignment. I did extensive testing with different terminals and scripts. With this added to the grapheme counting proc, I found coverage to be accurate for the following scripts:
Latin with diacritics
Chinese hanzi
Arabic
Cyrillic
Japanese kana
Korean hangul
Thai
Greek
Hebrew
Armenian
Georgian
Unfortunately, terminal emulator support for Indic scripts looks to be poor. The width at which different graphemes were rendered varied, and I could only get two terminals to align sample texts of Devanagari, Gujarati, and Tamil with a hack not described in the standard. I decided not to commit the hack in hopes of a future document or information that might address this.
Emoji support is also unreliable for the same reason that rendering is up to the terminal, and there are a myriad of multi-codepoint combinations that may or may not be supported on the emulator in question, which when unsupported, tend to degenerate to their component glyphs instead of showing as one unified glyph.
That said, I believe this is a more than adequate implementation of measuring character width as it's often called, with coverage for the majority of foreseeable cases.
This is an implementation of Standard Annex 11, the last piece of the puzzle to measuring a Unicode grapheme's width for the purposes of monospace alignment. I did extensive testing with different terminals and scripts. With this added to the grapheme counting proc, I found coverage to be accurate for the following scripts:
Unfortunately, terminal emulator support for Indic scripts looks to be poor. The width at which different graphemes were rendered varied, and I could only get two terminals to align sample texts of Devanagari, Gujarati, and Tamil with a hack not described in the standard. I decided not to commit the hack in hopes of a future document or information that might address this.
Emoji support is also unreliable for the same reason that rendering is up to the terminal, and there are a myriad of multi-codepoint combinations that may or may not be supported on the emulator in question, which when unsupported, tend to degenerate to their component glyphs instead of showing as one unified glyph.
That said, I believe this is a more than adequate implementation of measuring character width as it's often called, with coverage for the majority of foreseeable cases.