Closed eryksun closed 1 year ago
A seemingly related issue is that when a non-BMP character is manually pasted into the console, ReadConsoleW
echoes and returns only the first code in the UTF-16 encoded surrogate pair. For example, with "😞" only 0xD83D is returned. Here's a simple example in Python:
>>> from win32console import *
>>> h = GetStdHandle(STD_INPUT_HANDLE)
>>> h.ReadConsole(4)
�
'\ud83d\r\n'
(As before, I've replaced the echoed 0xD83D surrogate code with U+FFFD in order to avoid problems with pasting an invalid code point.)
This prevents pasting supplementary-plane characters on the command line in the CMD shell, which relies on the console's cooked read for its command-line interface. For example:
Legacy console:
C:\>echo 😞
😞
Windows Terminal:
C:\>echo ��
😞
V2 console:
C:\>echo �
�
PowerShell uses low-level ReadConsoleInputW
instead of ReadConsoleW
, so it's not subject to this bug, though external programs that inherit PowerShell's console may be.
@miniksa /cc for legacy v1 compat break
I'm having this issue https://github.com/gui-cs/Terminal.Gui/pull/2250#issuecomment-1340039913. Please see the difference on displaying non-BMP code points in a Windows Console Host
and in a Windows Terminal
. The non-BMP, 𝔽𝕆𝕆𝔹𝔸 appear with an additional space to make it aligned, but in the Windows Terminal
that additional space are ignored causing misalignment.
I'm having this issue gui-cs/Terminal.Gui#2250 (comment).
Is Terminal.Gui
using ReadConsoleW
to display these characters? ReadConsoleW
is primarily concerned with user input, not output to the screen. :smile:
Is
Terminal.Gui
usingReadConsoleW
to display these characters?ReadConsoleW
is primarily concerned with user input, not output to the screen. 😄
No, of course, it's using WriteConsoleOutputW
to output to the console :-)
But it was related with non-BMP and I used it, sorry.
Can we get a call on whether the WT team thinks this issue is mis-named and should be about the fact that WT does not seem to deal with non-BMP codepoints correctly when WriteConsoleOutputW
and ReadConsoleW
are used?
Or should we create a new issue?
As of now, Terminal.Gui needs to disable rendering of all non-BMP codepoints when using these APIs.
Note that when Terminal.Gui uses the .NET Console API, these codepoints work fine.
Alright, so I've got a couple updates.
ReadConsoleW
no longer returns an error with your test application.
WriteCharsLegacy
took a dependency on one codepoint occupying one wchar_t
(rather than one code unit) during some Unicode refactoring we did a while ago.ReadConsoleW
does not work properly in conhostv2 when you insert something that contains surrogate pairs or requires multiple code units. As in 1 above, that issue is now fixed.
wcwidth
and wcswidth
are woefully inadequate. There is no way to ask a terminal emulator how big a character is or was (and I don't think there should be, because what if the terminal emulator is tmux and it has many different heads? Or what if the font changes, and therefore the perceived size of the glyph would change, but the application can't know about it because it is font-agnostic? What about over a slow link like SSH, where answerback could time out and then the application would be no better off (though perhaps worse off, because it would need to handle a late synchronous reply)?)WriteConsoleOutputW
does not work properly: WriteConsoleOutput
works entirely cellwise, and there is no guarantee that you could ever emit a non-BMP character that requires a surrogate pair using it. I'm sorry to say that the cellwise APIs cannot represent the full gamut of text. For example: U+1F574 MAN IN BUSINESS SUIT LEVITATING
only occupies one column but requires two code units. The antiquated mapping of columns to code units just doesn't account for this case. :smile: However, writing known-to-be-wide characters that require two code units by way of two CHAR_INFO
structs should work. If that is not working, it would be worth filing a new issue for it.
It's "iii".
I get it now and will back off this Issue.
What Terminal.Gui is going to do (eventually) is retarget our WindowsDriver
away from using WriteConsoleOutput
to using the new Console Virtual Terminal Sequences API, which I assume with both let us support non-BMP and will still be nicely performant.
BTW, I appreciate your thorough responses!
Why the following behavior is expected (copy from description)
Test normal with ECHO ON
😞
stream (4): L"\ud83d\ude1e\u000d\u000a"
screen: L"\ud83d\ude1e "
Test paste with ECHO ON
😞
stream (4): L"\ud83d\ude1e\u000d\u000a"
screen: L"\ud83d\ude1e "
Test normal with ECHO OFF
stream (4): L"\ud83d\ude1e\u000d\u000a"
Test paste with ECHO OFF
stream (4): L"\ud83d\ude1e\u000d\u000a"
instead of
Test normal with ECHO ON
😞
stream (4): L"\ud83d\ude1e\u000d\u000a"
screen: L"\ud83d\ude1e\u0000 "
Test paste with ECHO ON
??
stream (4): L"??\u000d\u000a"
screen: L"??\u0000 "
Test normal with ECHO OFF
stream (4): L"\ud83d\ude1e\u000d\u000a"
Test paste with ECHO OFF
stream (4): L"??\u000d\u000a"
Test paste with ECHO OFF
input records:
vk: 12, kd: 1, ks: 0002, uc: 0000
vk: 66, kd: 1, ks: 0002, uc: 0000
vk: 66, kd: 0, ks: 0002, uc: 0000
vk: 63, kd: 1, ks: 0002, uc: 0000
vk: 63, kd: 0, ks: 0002, uc: 0000
vk: 12, kd: 0, ks: 0000, uc: d83d <- broken pair + '?' inserted
vk: 12, kd: 1, ks: 0002, uc: 0000
vk: 66, kd: 1, ks: 0002, uc: 0000
vk: 66, kd: 0, ks: 0002, uc: 0000
vk: 63, kd: 1, ks: 0002, uc: 0000
vk: 63, kd: 0, ks: 0002, uc: 0000
vk: 12, kd: 0, ks: 0000, uc: de1e <- broken pair + '?' inserted
vk: 00, kd: 1, ks: 0000, uc: 000d
vk: 00, kd: 0, ks: 0000, uc: 000d
stream (4): L"??\u000d\u000a"
despite the fact that the input buffer contains two Alt+Num'ed question marks and the halves of the surrogate pair are not consecutive?
Should the nonzero uc payload override all previous Alt+Num input when Alt is released?
Can a keyboard event with vk=VK_NUMPADx exist without ks=NUMLOCK_ON | ...? (ks=LEFT_ALT_PRESSED in the attached readsp.c)
Environment
Microsoft Windows [Version 10.0.18363.657] conhost.exe builtin console, V2 wt.exe terminal, V0.9.433.0
Steps to reproduce
readsp.zip
Extract, compile and run the attached readsp.c program under the V2 console. This programs exercises directly writing a non-BMP character to the input buffer via
WriteConsoleInputW
and reading it back viaReadConsoleW
, first with echo enabled and then with it disabled. Run the program with -v (e.g.readsp -v
) to show the input key-event records that each step tries to read. It tries a normal key down/up event pair as well as the Alt+Numpad sequence that the console uses for pasted text. The latter uses 6 key events per wide-character and thus 12 key events for a surrogate pair. I included the paste sequence to try to clarify a related issue in which manually pasting a non-BMP character produces a different incorrect result, but it didn't help. I'll discuss that related issue in a comment, in case it's all due to the same underlying issue.Expected behavior
ReadConsoleW
should be able to correctly read supplementary-plane (i.e. non-BMP) characters such as "😞" (U+1F61E), regardless of whether they are typed or pasted into the terminal window, or written directly to the input buffer, or whether echo is enabled. Since the wide-character API uses 16-bit characters, the non-BMP character should be read as a UTF-16 surrogate pair, e.g. U+1F61E should be encoded as {0xD83D, 0xDE1E}.ReadConsoleW
works as expected with the legacy (V1) console. For example:It almost works correctly with Windows Terminal version 0.9.433.0:
Apparently a cooked read under Windows Terminal has a bug in which a non-BMP character gets echoed as two replacement characters, U+FFFD. But at least the
ReadConsoleW
result is correct.Actual behavior
In the output below, not only does the cooked read fail with
ERROR_INVALID_PARAMETER
(87) when echo is enabled, but the echoed text contains only the first surrogate code of the surrogate pair, 0xD83D.Since it's not a valid Unicode character, I've replaced this lone surrogate code in the pasted text with the Unicode replacement character, U+FFFD, but the "screen" text, which gets read directly from the screen buffer, shows that the code displayed on the console is 0xD83D.