microsoft / terminal

The new Windows Terminal and the original Windows console host, all in the same place!
MIT License
95.88k stars 8.34k forks source link

Bug: For multi-byte characters like Chinese, some output encodings can cause incorrect text rendering. #18242

Open abgox opened 4 days ago

abgox commented 4 days ago

Windows Terminal version

1.21.3231.0

Windows build number

10.0.22635.0

Other Software

PowerShell

Steps to reproduce


  1. Emoji does not render properly in Windows Terminal Preview, but it works fine in ohter terminals like Windows Terminal,Tabby,Hyper.

[!NOTE]

I'm in China, so the encoding of Windows Terminal (Preview) is automatically changed to GB2312.

But ohter terminals like Tabby and Hyper are using UTF8 encoding.

Image

Image


  1. When the output encoding is switched to UTF8, Windows Terminal (Preview) has unexpected behavior in rendering Chinese or other multi-byte characters, but it works fine in other terminals like Tabby,Hyper.

[!NOTE]

other terminals like Tabby,Hyper works fine because they always use UTF8 encoding.

Image

Image

Expected Behavior

  1. Emoji can render properly in Windows Terminal Preview.
  2. For multi-byte characters like Chinese, render it correctly and should not add spaces by mistake.

Actual Behavior

  1. Emoji can't render properly in Windows Terminal Preview.
  2. For multi-byte characters like Chinese, spaces are added incorrectly.
lhecker commented 2 days ago

Unfortunately, it was never specified whether the BufferCell type supports "surrogate pairs" or not (which is what your 3 emojis use). It actually never supported them properly and it simply worked for SetBufferContents coincidentally, because there was no input validation. You could write anything into the text buffer, even completely bogus codepoints and it would just work. Now we validate all inputs and so this doesn't work anymore. BufferCell now only supports UCS2, which is all it ever properly supported.

The only APIs that support writing Unicode to the console are WriteConsoleW, as well as WriteFile and WriteConsoleA with SetConsoleOutputCP(CP_UTF8).

You can read more about our breaking changes here and the reason for doing them: https://github.com/microsoft/terminal/wiki/Console:-Potential-Breaking-Changes The one that affects you is the first bullet point (CHAR_INFO). Specifically, it's this PR that (intentionally) broke your code: https://github.com/microsoft/terminal/pull/13321

I apologize for the issues that this has caused for you. Please let me know if you have any questions!

abgox commented 1 day ago
  1. When the output encoding is switched to UTF8, Windows Terminal (Preview) has unexpected behavior in rendering Chinese or other multi-byte characters, but it works fine in other terminals like Tabby,Hyper.
lhecker commented 22 hours ago

We don't just maintain Windows Terminal but also all other parts of the console subsystem of Windows. One such component is "ConPTY" which is a translation layer from traditional console APIs like SetConsoleCursorPosition (= $Host.UI.RawUI.CursorPosition) to more modern VT sequences (= "`e[${y};${x}H"). This translation layer is used by Tabby and Hyper and also used by Windows Terminal.

The difference now is that Windows Terminal always bundles the latest version of ConPTY, while Tabby and Hyper use whatever version Windows comes with (which may be a few years behind). If you update to Windows 11 24H2 (build 26100) your Windows should have a version of ConPTY that performs input validation and then Tabby/Hyper will show the same issue.

Edit: If it's any consolidation, the usage of SetBufferContents already didn't work for most Emojis, even before this breaking change, due to zero width joiners. 🧑🏻‍❤️‍🧑🏼 for instance is 12 characters long but only occupies 2 cells.

abgox commented 11 hours ago

Oh, it will be difficult to render multibyte text such as Chinese characters via $Host.UI.RawUI.SetBufferContents() in the future.