microsoft / terminal

The new Windows Terminal and the original Windows console host, all in the same place!
MIT License
95.27k stars 8.27k forks source link

UTF-8 input/output still a problem. QEMU console case. #17371

Closed ant5 closed 4 months ago

ant5 commented 4 months ago

Windows Terminal version

1.21.1272.0

Windows build number

10.0.19045.4291

Other Software

QEMU 9.0

Steps to reproduce

Non-english Windows 10. Option "Beta: Use Unicode UTF-8 for worldwide language support" in Regional Settings is unchecked because of unreadable symbols in non-english applications when it checked.

VirualMachine with OS which use serial communication (com port) for terminal. Run:

start qemu-system-x86_64.exe ... -nographic ... -hdd vmimage.qcow2

Expected Behavior

Show correct characters and ability to make input when switched to non-english language.

Actual Behavior

All is OK but WindowsTerminal neither show non-english UTF-8 characters nor get non-english input: image

It definetly stay on non-english non-unicode codepage related to Windows default language.

As about input, when switching keyboard to non-english layout WindowsTerminal does not echoing symbols in response to keypress. Sequentually press on random buttons in non-english layout do echo some characters unrelated to keypress. Definetly only some sequence produce characters, single keypress does not produce expected character.

lhecker commented 4 months ago

Does it work if you run chcp 65001 before running qemu? Otherwise, can you run locale (without arguments) and show us the output?

ant5 commented 4 months ago

QEMU open default console by itself and route VM input/output to it. No option to interfere this process.

Only one thing came to my mind. What does chcp 65001 really do? If it emit special control sequence then I can try to force VM to emit this sequence at startup.

ant5 commented 4 months ago

But... As you notice there is a start for QEMU to get VM in a new window not in a current terminal.

But I can achive this by starting another batch file which in turn will do chcp and run QEMU: replacing: start qemu-system-x86_64.exe ... -nographic ... -hdd vmimage.qcow2 with start runvm.bat

In runvm.bat:

chcp 65001
qemu-system-x86_64.exe ... -nographic ... -hdd vmimage.qcow2

So problem still exist but QEMU has a working hack in my case.

eryksun commented 4 months ago

Since you appear to be using the CMD shell's start command, you don't actually need to run "chcp.com"[^1]. Just create a console window with a given session title, and configure console sessions with that title to use code page 65001. For example, the following creates a registry key for "QEMU" console sessions (like a profile) and sets 65001 as the "CodePage" to use:

> reg.exe add "HKCU\Console\QEMU" /v CodePage /t REG_DWORD /d 65001
The operation completed successfully.

Then use that session title in the start command. For example:

> start "QEMU" qemu-system-x86_64.exe ... -nographic ... -hdd vmimage.qcow2

The title must be in quotes.

[^1]: "chcp.com" is a console application that sets the console's input and output code pages to a given code page via SetConsoleCP(codePageID) and SetConsoleOutputCP(codePageID). This command, as well as "mode.com", were implemented in 1993 for the Windows NT console to emulate the builtin CHCP and MODE commands in the MS-DOS COMMAND.COM shell from back in the 1980s. However, the emulated commands a far cry from the original MS-DOS commands, which controlled not only device code pages but also synchronized the locale (e.g. names of days/months, format characters) and UI language (e.g. system messages and UI text) in the MS-DOS kernel.

The console code page on Windows NT systems has nothing to do with the locale or UI language in use by a console application. Windows applications can use the default user locale and UI language(s), the default system locale and UI language(s), or any ad hoc combination of locale and subset of the available UI languages. There's no setting in a console session for a preferred locale and UI language(s) to use for the session, in contrast to the "LANG", "LC_*", and "LANGUAGE" environment variables on POSIX. This is a serious deficiency.

Fortunately with UTF-8 support finally implemented in the console and the C runtime's locale implementation, at least locale data, system messages, and UI text can always be encoded when legacy applications write to console files, pipes, and disk files. The situation can be far worse if a legacy code page such as 1252 (Latin-1) gets paired with an incompatible locale or UI language.