jquast / wcwidth

Python library that measures the width of unicode strings rendered to a terminal
Other
393 stars 58 forks source link

wcswidth incorrect for heart emoji, ❤️ ("\u2764\ufe0f") #96

Closed dscrofts closed 10 months ago

dscrofts commented 10 months ago

Hello,

The wcswidth function seems to be incorrectly calculating the width of the heart "❤️" ("\u2764\ufe0f") moji. An example:

>>> from wcwidth import wcswidth
>>> wcswidth("❤️")
1
>>> wcswidth("💞")
2
>>> wcswidth("💘")
2

The heart emoji occupies 2 cells and should be returning 2 as per the other examples above.

jquast commented 10 months ago

In this case, the first character "\u2764" (❤) is a width of 1, but is then cojoined with a second character, variation selector "\ufe0f" which then modifies the cell length of the first character to 2.

We don't have any code in wcwidth to detect this special kind of combining, this might be like the Devanagari issue #47, that "combiner may sometimes increase the width of the previous cell, depending on its value"

jquast commented 10 months ago

this sequence is in https://www.unicode.org/Public/UCD/latest/ucd/emoji/emoji-variation-sequences.txt which doesn't seem to hint about this width modification, but maybe it could be used to test for and make a "narrow to wide variations" table of sorts.

jquast commented 10 months ago

I have created a solution in #97, I am now testing it with popular terminals, thanks again for the bug report

jquast commented 10 months ago

This fix will be soon released in version 0.2.10. I also tested many terminals for VS16 support, only about 28% of those tested support any kind of VS-16 sequence. https://ucs-detect.readthedocs.io/results.html