Closed vadi2 closed 3 years ago
This... is correct. EastAsianWidth lists it as "neutral" width (shield is U+1F6E1):
1F6E0..1F6EA;N # So [11] HAMMER AND WRENCH..NORTHEAST-POINTING AIRPLANE
Your terminal is rendering wrong.
I understand width 1 and 2, but "neutral" is not a number.
From http://www.unicode.org/reports/tr11/:
Strictly speaking, it makes no sense to talk of narrow and wide for neutral characters, but because for all practical purposes they behave like Na, they are treated as narrow characters (the same as Na) under the recommendations below.
This means that "neutral" characters have width 1.
And before you ask, no, you probably don't want to treat neutral characters as wide. Here's some other examples of neutral characters:
ยฉ (U+A9) ยป (U+BB) ร (U+C0)
I suggest you file a bug with whatever is handling the rendering and, until that's fixed, special-case U+1F6E1 in your code.
We're the ones building the rendering engine. Still looking for a clear answer on this since I see no relationship between the shield emoji and the copyright symbol with east asian characters... but we're getting there.
See 7e9dfdaf05059b3fff237a8619b6b4fb187570e7 . My terminals do indeed render ๐ก as width 1, along with ๐ถ.
Perhaps this is the rationale:
Narrow (and neutral) Unicode characters always map to halfwidth characters
@vadi2 I have now given an explanation for this in our README. In short:
An upgrade for you would be quite simple. It should be enough to add
if (unicode == 0x1F6E1) return 2;
to your getGraphemeWidth function here (and any other direct uses of widechar_wcwidth):
This allows you to override the width that widecharwidth decides - which is correct according to Unicode, but not your renderer.
Upgrading would allow you to gain support for Unicode 14 instead of 12.
Sorry we didn't get back to you earlier, things were hectic.
Thanks for the writeup!
I guess there are more emojis than the shield which should be treated like that shield emoji on a terminal? E.g. this blog post says that terminals should treat all emojis representations as width 2 (in a terminal): https://darrenburns.net/posts/emoji-in-the-terminal/ with the example of \U0001F6E5
๐ฅ motorboat emoji.
Background: wezterm uses ridiculousfish/widecharwidth and in https://github.com/wez/wezterm/issues/1607 there is a discussion if the motorboat should always display as two chars or one char.
I guess there are more emojis than the shield which should be treated like that shield emoji on a terminal?
@jankatins Okay, first let me super clear: The shield width should have width 1 in a terminal, because that's the width unicode says it has.
The context here is very specific. There are developers controlling both the renderer (essentially the "terminal") and the client (the app running in the terminal). They have a different need, so they add a quirk that the shield width displays in a non-standard width.
In a terminal context that's a horrible idea, because in a terminal context you don't control both applications, and both need to come up with the same width on their own (or you get awkward cursor glitches!). The only fighting chance you have of that is to go by the standard.
the example of \U0001F6E5 motor_boat motorboat emoji
As best as I can tell, motorboat is also neutral, meaning it should also have width 1. If you do anything else, you are likely to break cursor movement if it appears.
Specifically, from the linked article:
This is incorrect in the case of Emoji Presentation Sequences - Unicode recommends they should be always treated as "East Asian Wide"
See emoji-data.txt. U+1F6E5 is not listed as having "Emoji_Presentation". Instead, the range from U+1F6E0 to U+1F6E5 is listed only as "Emoji" and in emoji-sequences.txt they are always listed along with the U+FE0F variation selector as "Basic_Emoji". This leads us to believe that the default presentation for them is text presentation, meaning that they should have width 1. The "emoji presentation sequence" here is U+1F6E5 U+FE0F - both together!
Compare e.g. U+1F600 (๐), which is listed as having "Emoji_Presentation" and is by itself listed as a "Basic_Emoji". (Yes, the unicode data file format is a mess and changes too often, and there's no great explanation for any of it. That link to unicode.org seems to be a link for a consumer-facing emoji presentation presentation, and doesn't appear to have any impact on the actual presentation. It's inaccurate, the sequence should be U+1F6E5 U+FE0F)
It is of course possible that this reading is wrong, but in that case the solution is emphatically not to "treat them like the shield emoji" and quirk them out. The solution is to fix the interpretation of the standard and find a general answer.
Thank you for the explanation!
It doesn't help when the framework one is using (Qt) does not respect/handle the U+FE0E (Text presentation) and U+FE0F (Emoji presentation) Unicode variation selectors... QTBUG-97401
The ๐ก emoji before, generated on
2020-03-21
:Width of ๐ก is reported as
2
.After, generated on
2021-04-17
:Width of ๐ก is reported as
1
, and makes the text overlap.The only change is an update to widechar_width file.
Sorry we didn't mention it earlier - things were hectic at the time.