unicode-rs / unicode-width

Displayed width of Unicode characters and strings according to UAX#11 rules.
https://unicode-rs.github.io/unicode-width
Other
215 stars 27 forks source link

Assign the same CJK width to canonically equivalent strings #52

Closed Jules-Bertholet closed 6 months ago

Jules-Bertholet commented 6 months ago

UAX 11:

Modern Rendering Practice. […] The set of characters with mappings to legacy character sets that have been assigned ambiguous width constitute a superset of the set of such characters that may be rendered as wide characters in a given context. In particular, an application might find it useful to treat characters from alphabetic scripts as narrow by default. Conversely, many of the symbols in the Unicode Standard have no mappings to legacy character sets, yet they may be rendered as “wide” characters if they appear in an East Asian context. An implementation might therefore elect to treat them as ambiguous even though they are classified as neutral here.

"Treat characters from alphabetic scripts as narrow by default" is the biggest change this PR makes. To achieve full canonical equivalence, we also need to adjust the width of a few mathematical symbols with diagonal strikethrough, and of U+0387 GREEK ANO TELEIA.