weechat / weechat

The extensible chat client.
https://weechat.org
GNU General Public License v3.0
2.92k stars 328 forks source link

Wrong display of unicode chars with zero-width joiner (U+200D) #1861

Open flashcode opened 1 year ago

flashcode commented 1 year ago

Bug summary

Unicode chars with zero-width joiner (U+200D) are displayed with wrong column count in chat area.

This is one remaining bug of issue #1857, which has been closed because it was a ncurses bug.

Steps to reproduce

  1. Compile WeeChat compiled against ncurses ≥ 6.3_p20220612
  2. Run WeeChat in a terminal able to display such Unicode char, like WezTerm or kitty (not to be confused with KiTTY which is a fork of PuTTY).
  3. Execute this command: /eval /print [${\U1F62E\u200D\U1F4A8}]

Current behavior

Output is:

image

Note the two spaces before the closing bracket: they should not be displayed.

Expected behavior

No spaces before the closing bracket, ie WeeChat knows that the char is displayed on 2 columns and not 4.

Suggested solutions

Count chars together to have a displayed size of 2 instead of 4.

Additional information

Be sure you're using ncurses ≥ 6.3_p20220612 because older versions have a broken display (see #1857).


trygveaa commented 1 year ago

Note the two spaces before the closing bracket: they should not be displayed.

This is not the fault of WeeChat, but a bug in kitty (https://github.com/kovidgoyal/kitty/issues/1978). If you try this in WezTerm, you'll see that you don't get any spaces before the closing bracket.

However, a mismatch in the number of columns expected for a character between WeeChat and the terminal emulator causes lots of other issues, like characters remaining after changing buffer when they should be cleared, bar separators appearing in the wrong column and characters typed in the input bar appearing at the wrong place.

You can reproduce the issue with ghost characters after switching buffer by running this in a terminal emulator which uses 2 columns for that emoji (e.g. WezTerm, not kitty):

weechat -t -r '/buffer clear; /print -escape \U1F62E\u200D\U1F4A8; /buffer add -switch 2; /print -buffer core.2 1111'

And then switching to the core buffer. You will see it contains 😮‍💨11 and if you press Ctrl+l, 11 will disappear.

As far as I know there are two types of emojis where different programs disagree on the width of them. The first is emojis consisting of multiple emojis joined by U+200D to form a single emoji, like the emoji above. The second is emojis made by appending U+FE0F to a non-emoji character (not applicable for all characters, only specific ones which have an emoji variant), e.g. ❤️.

Different terminal emulators handle these differently, and to avoid issues in WeeChat, it needs to use the same number of columns as the terminal emulator. Unfortunately, there is as far as I know no way for WeeChat to get this information from the terminal emulator. Therefore, I think WeeChat would have to have some options to control this, and possibly try to detect the terminal emulator (that may be tricky when running inside a multiplexer though).

As for the specific widths, for emojis with U+200D they are either counted by ignoring the U+200D and counting the emojis individually. For the emoji above that gets a width of 4, but other emojis may consist of more emojis and be longer. Otherwise, if the emoji is handled correctly with the U+200D, I think it always gets a width of 2.

Terminal emulators counting individually (width 4 or more): alacritty, foot, gnome-terminal, kitty (but recognized as a bug, so may change at some point), konsole, lxterminal, mate-terminal, qterminal, sakura, st, terminator, terminology, tilix, urxvt, xfce4-terminal, xterm Terminal emulators counting it as a single emoji (width 2): contour, wezterm

As noted in #1857, the currently latest release of ncurses strips away U+200D, so unless you have patch 20220612 or newer, the emojis will always be displayed individually and the widths counted individually.

For emojis with U+FE0F, the U+FE0F is either ignored or taken into account. If it's ignored, the emoji gets a width of 1. If it's accounted for, it gets a width of 2.

Terminal emulators ignoring it (width 1): alacritty, contour (when in alternate screen buffer), foot, gnome-terminal, konsole, lxterminal, mate-terminal, sakura, st, terminator, terminology, tilix, urxvt, wezterm (by default), xfce4-terminal, xterm Terminal emulators accounting for it (width 2): contour (when in normal screen buffer), kitty, qterminal, wezterm (with unicode version set to 14)

In summary, emojis with U+FE0F currently cause rendering issues in kitty and qterminal, and after the next release of ncurses (or if you use the latest ncurses patches, like alpine does) emojis with U+200D will cause rendering issues in contour and wezterm.

stacyharper commented 1 year ago

I reported the issue to Foot and the developer also gave me explanation on unicode char width computing. It looks like something really is wrong with Weechat cause Kitty isn't the only terminal where chars are broken.

https://codeberg.org/dnkl/foot/issues/1462#issuecomment-1037310