unicode-rs / unicode-width

Displayed width of Unicode characters and strings according to UAX#11 rules.
https://unicode-rs.github.io/unicode-width
Other
217 stars 27 forks source link

Is calculating 👩‍🔬 as 4-wide really UAX-11 compliant? #20

Closed Artoria2e5 closed 3 years ago

Artoria2e5 commented 3 years ago

The Recommendations section of UAX-11 has included this statement ever since Revision 33 from Unicode 10.0:

[UTS51] emoji presentation sequences behave as though they were East Asian Wide, regardless of their assigned East_Asian_Width property value.

If the crate is to operate according to UAX-11 per the README, it appears that it needs to have the capability to count composed emoji sequences as a single East Asian Wide (2-width) grapheme. That would mean a unicode-segmentation dependency if we are looking for a fix, or a change to the README to describe this deviation if we are trying to be honest.

Manishearth commented 3 years ago

Yes, it is compliant, because "emoji presentation sequence" refers to a specific thing involving a variation selector

This case is handled by the data treating variation selectors as zero-width.

Artoria2e5 commented 3 years ago

D'oh, that makes sense now.