Open rprichard opened 7 years ago
@miniksa for visibility. I believe he changed some stuff in that API to make it consistent with ConhostV1.
There's a fix that didn't make RS2 RTM. I'm looking into what we can do about it.
It's probably easy for winpty to detect the broken ReadConsoleOutput
behavior. (e.g. At startup, write some potentially-wide characters and see whether ReadConsoleOutput
mishandles any of them.) winpty could then enable a workaround.
e.g. winpty could call ReadConsoleOutput
as it does now, but restore the original API behavior by expanding a CHAR_INFO
record into two CHAR_INFO
records when the UnicodeChar
value is a 2-column character. It would determine that by either:
Duplicating the logic from PowerShell (https://github.com/Microsoft/vscode/issues/19665#issuecomment-287477274), or
Asking the console: create a special, hidden console screen buffer, then use WriteConsole
and ReadConsoleOutput
. Write something like ${UnicodeChar}\nAB
, then read a 2x2 rectangle and see whether the third cell is A or B.
This workaround would add a lot of complexity, so I'd prefer some other solution if possible.
API note: ReadConsoleOutput
makes it clear that the CHAR_INFO
buffer is a two-dimensional array, but I don't think MSDN ever explains how a two-column character is supposed to be represented. It mentions the COMMON_LVB_{LEADING,TRAILING}_BYTE
attributes, but doesn't describe them in any detail, and of course, the names aren't quite right -- we're looking for leading/trailing cells, not bytes.
Aside: winpty may also need to stop using such tiny fonts.
As a workaround, reading lines one by one seems to fix strange line break output.
diff --git a/src/agent/Win32ConsoleBuffer.cc b/src/agent/Win32ConsoleBuffer.cc
index ed93f40..dc4efce 100755
--- a/src/agent/Win32ConsoleBuffer.cc
+++ b/src/agent/Win32ConsoleBuffer.cc
@@ -157,8 +157,18 @@ void Win32ConsoleBuffer::setCursorPosition(const Coord &coord) {
void Win32ConsoleBuffer::read(const SmallRect &rect, CHAR_INFO *data) {
// TODO: error handling
SmallRect tmp(rect);
- if (!ReadConsoleOutputW(m_conout, data, rect.size(), Coord(), &tmp) &&
- isTracingEnabled()) {
+ CHAR_INFO *buffer = data;
+ Coord bufferSize = Coord(rect.width(), 1);
+ BOOL success = TRUE;
+ for (SHORT y = rect.Top; y <= rect.Bottom; y++, buffer += rect.width()) {
+ tmp.Top = y;
+ tmp.Bottom = y;
+ if (!ReadConsoleOutputW(m_conout, buffer, bufferSize, Coord(), &tmp)) {
+ success = FALSE;
+ break;
+ }
+ }
+ if (!success && isTracingEnabled()) {
StringBuilder sb(256);
auto outStruct = [&](const SMALL_RECT &sr) {
sb << "{L=" << sr.Left << ",T=" << sr.Top
I think a better way to accomplish this is to set (useLargeReads
, maxReadLines
) to (false
, 1
).
https://github.com/rprichard/winpty/blob/4978cf94b6ea48e38eea3146bd0d23210f87aa89/src/agent/LargeConsoleRead.cc#L50.
I'd expect that change to help mitigate the situation -- actually, if it works, it's probably a good idea.
I'm not sure what is expected to appear at the end of the ReadConsoleOutput
buffer -- IIRC, I've only seen CHAR_INFO
records that are all zeros -- effectively black-on-black NUL. I wonder about the effect on the console->terminal conversion code here, https://github.com/rprichard/winpty/blob/4978cf94b6ea48e38eea3146bd0d23210f87aa89/src/agent/Terminal.cc#L369. I'm guessing it'd output an entire line, followed by a color change to Black-on-Black-plus-Conceal, followed by NULs that the terminal ignores? The workaround should probably include code removing zeroed CHAR_INFO
values from the Terminal::sendLine
width.
Edit: fix "followed by two NULs that the terminal ignores" -- it could be one NUL or any number of NULs.
@miniksa Can you confirm that when this issue (i.e. the double-column ReadConsoleOutput
bug) occurs, that the fields of the trailing CHAR_INFO
values will be zero?
This workaround would reduce the likelihood of a successful Scraper::scrollingScrapeOutput
tentative read, but I think that's acceptable. I think it doesn't affect correctness, because the Scraper checks at the end whether the sync marker moved while it was reading screen buffer data and properties.
@shirosaki FWIW, that change would break Scraper::findSyncMarker, which assumes it can read an entire screen buffer column (3000 lines) efficiently and atomically.
@rprichard The problem is that ReadConsoleOutput* is going to give you different information depending on whether the original text was written with WriteConsoleOutputA, WriteConsoleOutputW, WriteFile, WriteConsoleA, or WriteConsoleW AND whether or not a Raster font or TrueType font is used at the time.
I can't definitively tell you what the appropriate pattern for double-byte characters will be nor what the trailing fields will be filled with. The definitive solution would be to make a new API that is correct all the time or, my personal preference, to build a PTY mechanism directly into Windows and deprecate all the arguably terrible Windows Console APIs. But those solutions will take time.
In the mean time, as @zadjii-msft alluded to above, I recently wrote a massive test and ensured that the v2 console and the v1 console should be exactly the same and follow the below absolutely terrible too-many-dimensional matrix created from decades of bugs that are now preserved for compatibility. You are welcome to use this as a basis to try to figure out the right thing to provide through WinPty. Hopefully this will evolve into an MSDN article and/or blog post one day, time permitting (cough @bitcrazed cough).
There also might be some differences between what is listed below (the v1 behavior and the now-fixed v2 behavior) and what you see in some builds of Windows. That's because it got broken at some point and then fixed again at another point. This test is now in place and should be applicable going forward for all v2 consoles, but I'm not certain which builds the fix is in and which it is not in. I just know that this is what we should program to and fix anything that isn't working this way.
Also fair warning: I want to deprecate/remove Raster fonts from v2 in a future edition of the console, so please don't build too deeply on top of those. They don't scale for High DPI, they don't work for multilingual text, and they're just plain bad.
I would recommend that you choose one of these patterns that gives you the information that you need and uses a TrueType font selected (or accounts for the both font potentials using GetConsoleCurrentFontEx
to see if the current font is a Raster font, typically Terminal
). Patterns 4, 5, 6, 8, and 9 and the associated API call scenarios are probably closest to what you want, but you might need to detect a few others depending on what the hosted app used to write its text.
Other writes = CRT write (printf, etc.) A or W -OR- WriteConsoleOutputCharacter -OR- WriteConsole
Other writes = CRT write (printf, etc.) A or W -OR- WriteConsoleOutputCharacter -OR- WriteConsole -OR- WriteConsoleOutputA
Each table below shows what you would get if you had written to the console with the following string and settings:
0x7
(the buffer 'background' color or default color)0x29
(the color written with the text for APIs that support writing text colors)QいかなZYXWVUTに
MultiByteToWideChar
/WideCharToMultiByte
translation when needed.The Attr
field is the color as retrieved by either ReadConsoleOutput*
in the CHAR_INFO
structure or the attrs returned via ReadConsoleOutputAttribute
.
ReadConsoleOutput*
tables below, the attr
and wchar (char)
columns will be in sync depending on what is in the CHAR_INFO
structure.ReadConsoleOutputCharacter*
tables below, the attr
column is taken from ReadConsoleOutputAttribute
and the wchar (char)
column is taken from ReadConsoleOutputCharacter*
. This can result in the attrs appearing misaligned with the chars in the table form when the ReadConsoleOutputAttribute
method ends up returning more data than the ReadConsoleOutputCharacter*
method.The Wchar (char)
field is the string that will be returned. If you are using an API that returns a CHAR_INFO
structure, both pieces will be returned in the union. If you use an API that simply returns a string, it will return the relevant half to the A/W type of API you called.
The Symbol
field explains the data that was received.
attr | wchar (char) | symbol |
---|---|---|
0x029 |
0x0051 (0x51) |
Q |
0x029 |
0x3044 (0x44) |
Hiragana I |
0x029 |
0x304B (0x4B) |
Hiragana KA |
0x029 |
0x306A (0x6A) |
Hiragana NA |
0x029 |
0x005A (0x5A) |
Z |
0x029 |
0x0059 (0x59) |
Y |
0x029 |
0x0058 (0x58) |
X |
0x029 |
0x0057 (0x57) |
W |
0x029 |
0x0056 (0x56) |
V |
0x029 |
0x0055 (0x55) |
U |
0x029 |
0x0054 (0x54) |
T |
0x029 |
0x306B (0x6B) |
Hiragana NI |
0x007 |
0x0020 (0x20) |
<space> |
0x007 |
0x0020 (0x20) |
<space> |
0x007 |
0x0020 (0x20) |
<space> |
0x007 |
0x0020 (0x20) |
<space> |
attr | wchar (char) | symbol |
---|---|---|
0x029 |
0x0051 (0x51) |
Q |
0x029 |
0x3044 (0x44) |
Hiragana I |
0x029 |
0x304B (0x4B) |
Hiragana KA |
0x029 |
0x306A (0x6A) |
Hiragana NA |
0x029 |
0x005A (0x5A) |
Z |
0x029 |
0x0059 (0x59) |
Y |
0x029 |
0x0058 (0x58) |
X |
0x029 |
0x0057 (0x57) |
W |
0x029 |
0x0056 (0x56) |
V |
0x007 |
0x0020 (0x20) |
<space> |
0x007 |
0x0020 (0x20) |
<space> |
0x007 |
0x0020 (0x20) |
<space> |
0x007 |
0x0020 (0x20) |
<space> |
0x000 |
0x0000 (0x00) |
<null> |
0x000 |
0x0000 (0x00) |
<null> |
0x000 |
0x0000 (0x00) |
<null> |
attr | wchar (char) | symbol |
---|---|---|
0x029 |
0x0051 (0x51) |
Q |
0x029 |
0x3044 (0x44) |
Hiragana I |
0x029 |
0x304B (0x4B) |
Hiragana KA |
0x029 |
0x306A (0x6A) |
Hiragana NA |
0x029 |
0x005A (0x5A) |
Z |
0x029 |
0x0059 (0x59) |
Y |
0x029 |
0x0058 (0x58) |
X |
0x029 |
0x0057 (0x57) |
W |
0x029 |
0x0056 (0x56) |
V |
0x029 |
0x0055 (0x55) |
U |
0x029 |
0x0054 (0x54) |
T |
0x029 |
0x306B (0x6B) |
Hiragana NI |
0x000 |
0x0000 (0x00) |
<null> |
0x000 |
0x0000 (0x00) |
<null> |
0x000 |
0x0000 (0x00) |
<null> |
0x000 |
0x0000 (0x00) |
<null> |
attr | wchar (char) | symbol |
---|---|---|
0x029 |
0x0051 (0x51) |
Q |
0x129 |
0x3044 (0x44) |
Hiragana I |
0x229 |
0xFFFF (0xFF) |
Invalid Unicode Character |
0x129 |
0x304B (0x4B) |
Hiragana KA |
0x229 |
0xFFFF (0xFF) |
Invalid Unicode Character |
0x129 |
0x306A (0x6A) |
Hiragana NA |
0x229 |
0xFFFF (0xFF) |
Invalid Unicode Character |
0x029 |
0x005A (0x5A) |
Z |
0x029 |
0x0059 (0x59) |
Y |
0x029 |
0x0058 (0x58) |
X |
0x029 |
0x0057 (0x57) |
W |
0x029 |
0x0056 (0x56) |
V |
0x029 |
0x0055 (0x55) |
U |
0x029 |
0x0054 (0x54) |
T |
0x129 |
0x306B (0x6B) |
Hiragana NI |
0x229 |
0xFFFF (0xFF) |
Invalid Unicode Character |
attr | wchar (char) | symbol |
---|---|---|
0x029 |
0x0051 (0x51) |
Q |
0x129 |
0x3044 (0x44) |
Hiragana I |
0x229 |
0x3044 (0x44) |
Hiragana I |
0x129 |
0x304B (0x4B) |
Hiragana KA |
0x229 |
0x304B (0x4B) |
Hiragana KA |
0x129 |
0x306A (0x6A) |
Hiragana NA |
0x229 |
0x306A (0x6A) |
Hiragana NA |
0x029 |
0x005A (0x5A) |
Z |
0x029 |
0x0059 (0x59) |
Y |
0x029 |
0x0058 (0x58) |
X |
0x029 |
0x0057 (0x57) |
W |
0x029 |
0x0056 (0x56) |
V |
0x029 |
0x0055 (0x55) |
U |
0x029 |
0x0054 (0x54) |
T |
0x129 |
0x306B (0x6B) |
Hiragana NI |
0x229 |
0x306B (0x6B) |
Hiragana NI |
attr | wchar (char) | symbol |
---|---|---|
0x029 |
0x0051 (0x51) |
Q |
0x129 |
0x0082 (0x82) |
Hiragana I Shift-JIS Codepage 932 Lead Byte |
0x229 |
0x00A2 (0xA2) |
Hiragana I Shift-JIS Codepage 932 Trail Byte |
0x129 |
0x0082 (0x82) |
Hiragana KA Shift-JIS Codepage 932 Lead Byte |
0x229 |
0x00A9 (0xA9) |
Hiragana KA Shift-JIS Codepage 932 Trail Byte |
0x129 |
0x0082 (0x82) |
Hiragana NA Shift-JIS Codepage 932 Lead Byte |
0x229 |
0x00C8 (0xC8) |
Hiragana NA Shift-JIS Codepage 932 Trail Byte |
0x029 |
0x005A (0x5A) |
Z |
0x029 |
0x0059 (0x59) |
Y |
0x029 |
0x0058 (0x58) |
X |
0x029 |
0x0057 (0x57) |
W |
0x029 |
0x0056 (0x56) |
V |
0x029 |
0x0055 (0x55) |
U |
0x029 |
0x0054 (0x54) |
T |
0x129 |
0x0082 (0x82) |
Hiragana NI Shift-JIS Codepage 932 Lead Byte |
0x229 |
0x00C9 (0xC9) |
Hiragana NI Shift-JIS Codepage 932 Trail Byte |
attr | wchar (char) | symbol |
---|---|---|
0x029 |
0x0051 (0x51) |
Q |
0x129 |
0x3082 (0x82) |
Hiragana I Unicode 0x3044 with the lower byte covered by Shift-JIS Codepage 932 Lead Byte 0x82. |
0x229 |
0xFFA2 (0xA2) |
Invalid Unicode Character 0xFFFF with the lower byte covered by Shift-JIS Codepage 932 Trail Byte 0xA2 |
0x129 |
0x3082 (0x82) |
Hiragana KA Unicode 0x304B with the lower byte covered by Shift-JIS Codepage 932 Lead Byte 0x82. |
0x229 |
0xFFA9 (0xA9) |
Invalid Unicode Character 0xFFFF with the lower byte covered by Shift-JIS Codepage 932 Trail Byte 0xA9 |
0x129 |
0x3082 (0x82) |
Hiragana NA 0x306A with the lower byte covered by Shift-JIS Codepage 932 Lead Byte 0x82. |
0x229 |
0xFFC8 (0xC8) |
Invalid Unicode Character 0xFFFF with the lower byte covered by Shift-JIS Codepage 932 Trail Byte 0xC8 |
0x029 |
0x005A (0x5A) |
Z |
0x029 |
0x0059 (0x59) |
Y |
0x029 |
0x0058 (0x58) |
X |
0x029 |
0x0057 (0x57) |
W |
0x029 |
0x0056 (0x56) |
V |
0x007 |
0x0020 (0x20) |
<space> |
0x007 |
0x0020 (0x20) |
<space> |
0x007 |
0x0020 (0x20) |
<space> |
0x007 |
0x0020 (0x20) |
<space> |
attr | wchar (char) | symbol |
---|---|---|
0x029 |
0x0051 (0x51) |
Q |
0x129 |
0x3082 (0x82) |
Hiragana I Unicode 0x3044 with the lower byte covered by Shift-JIS Codepage 932 Lead Byte 0x82. |
0x229 |
0xFFA2 (0xA2) |
Invalid Unicode Character 0xFFFF with the lower byte covered by Shift-JIS Codepage 932 Trail Byte 0xA2 |
0x129 |
0x3082 (0x82) |
Hiragana KA Unicode 0x304B with the lower byte covered by Shift-JIS Codepage 932 Lead Byte 0x82. |
0x229 |
0xFFA9 (0xA9) |
Invalid Unicode Character 0xFFFF with the lower byte covered by Shift-JIS Codepage 932 Trail Byte 0xA9 |
0x129 |
0x3082 (0x82) |
Hiragana NA 0x306A with the lower byte covered by Shift-JIS Codepage 932 Lead Byte 0x82. |
0x229 |
0xFFC8 (0xC8) |
Invalid Unicode Character 0xFFFF with the lower byte covered by Shift-JIS Codepage 932 Trail Byte 0xC8 |
0x029 |
0x005A (0x5A) |
Z |
0x029 |
0x0059 (0x59) |
Y |
0x029 |
0x0058 (0x58) |
X |
0x029 |
0x0057 (0x57) |
W |
0x029 |
0x0056 (0x56) |
V |
0x029 |
0x0055 (0x55) |
U |
0x029 |
0x0054 (0x54) |
T |
0x129 |
0x3082 (0x30) |
Hiragana NI 0x306B with the lower byte covered by Shift-JIS Codepage 932 Lead Byte 0x82. |
0x229 |
0xFFC9 (0xC9) |
Invalid Unicode Character 0xFFFF with the lower byte covered by Shift-JIS Codepage 932 Trail Byte 0xC9 |
attr | wchar (char) | symbol |
---|---|---|
0x029 |
0x0051 (0x51) |
Q |
0x129 |
0x3082 (0x82) |
Hiragana I Unicode 0x3044 with the lower byte covered by Shift-JIS Codepage 932 Lead Byte 0x82. |
0x229 |
0x30A2 (0xA2) |
Hiragana I Unicode 0x3044 with the lower byte covered by Shift-JIS Codepage 932 Trail Byte 0xA2 |
0x129 |
0x3082 (0x82) |
Hiragana KA Unicode 0x304B with the lower byte covered by Shift-JIS Codepage 932 Lead Byte 0x82. |
0x229 |
0x30A9 (0xA9) |
Hiragana KA Unicode 0x304B with the lower byte covered by Shift-JIS Codepage 932 Trail Byte 0xA9 |
0x129 |
0x3082 (0x82) |
Hiragana NA 0x306A with the lower byte covered by Shift-JIS Codepage 932 Lead Byte 0x82. |
0x229 |
0x39C8 (0xC8) |
Hiragana NA 0x306A with the lower byte covered by Shift-JIS Codepage 932 Trail Byte 0xC8 |
0x029 |
0x005A (0x5A) |
Z |
0x029 |
0x0059 (0x59) |
Y |
0x029 |
0x0058 (0x58) |
X |
0x029 |
0x0057 (0x57) |
W |
0x029 |
0x0056 (0x56) |
V |
0x029 |
0x0055 (0x55) |
U |
0x029 |
0x0054 (0x54) |
T |
0x129 |
0x3082 (0x30) |
Hiragana NI 0x306B with the lower byte covered by Shift-JIS Codepage 932 Lead Byte 0x82. |
0x229 |
0x30C9 (0xC9) |
Hiragana NI 0x306B with the lower byte covered by Shift-JIS Codepage 932 Trail Byte 0xC9 |
attr | wchar (char) | symbol |
---|---|---|
0x029 |
0x0051 (0x51) |
Q |
0x129 |
0x3044 (0x44) |
Hiragana I |
0x229 |
0x304B (0x4B) |
Hiragana KA |
0x129 |
0x306A (0x6A) |
Hiragana NA |
0x229 |
0x005A (0x5A) |
Z |
0x129 |
0x0059 (0x59) |
Y |
0x229 |
0x0058 (0x58) |
X |
0x029 |
0x0057 (0x57) |
W |
0x029 |
0x0056 (0x56) |
V |
0x029 |
0x0055 (0x55) |
U |
0x029 |
0x0054 (0x54) |
T |
0x029 |
0x306B (0x6B) |
Hiragana NI |
0x029 |
0x0000 (0x00) |
<null> |
0x029 |
0x0000 (0x00) |
<null> |
0x129 |
0x0000 (0x00) |
<null> |
0x229 |
0x0000 (0x00) |
<null> |
attr | wchar (char) | symbol |
---|---|---|
0x029 |
0x0051 (0x51) |
Q |
0x029 |
0x3044 (0x44) |
Hiragana I |
0x029 |
0x304B (0x4B) |
Hiragana KA |
0x029 |
0x306A (0x6A) |
Hiragana NA |
0x029 |
0x005A (0x5A) |
Z |
0x029 |
0x0059 (0x59) |
Y |
0x029 |
0x0058 (0x58) |
X |
0x029 |
0x0057 (0x57) |
W |
0x029 |
0x0056 (0x56) |
V |
0x029 |
0x0055 (0x55) |
U |
0x029 |
0x0054 (0x54) |
T |
0x029 |
0x306B (0x6B) |
Hiragana NI |
0x007 |
0x0020 (0x20) |
<space> |
0x007 |
0x0020 (0x20) |
<space> |
0x007 |
0x0020 (0x20) |
<space> |
0x007 |
0x0020 (0x20) |
<space> |
attr | wchar (char) | symbol |
---|---|---|
0x029 |
0x0051 (0x51) |
Q |
0x129 |
0x3044 (0x44) |
Hiragana I |
0x229 |
0x304B (0x4B) |
Hiragana KA |
0x129 |
0x306A (0x6A) |
Hiragana NA |
0x229 |
0x005A (0x5A) |
Z |
0x129 |
0x0059 (0x59) |
Y |
0x229 |
0x0058 (0x58) |
X |
0x029 |
0x0057 (0x57) |
W |
0x029 |
0x0056 (0x56) |
V |
0x029 |
0x0020 (0x20) |
<space> |
0x029 |
0x0020 (0x20) |
<space> |
0x029 |
0x0020 (0x20) |
<space> |
0x007 |
0x0020 (0x20) |
<space> |
0x007 |
0x0000 (0x00) |
<null> |
0x007 |
0x0000 (0x00) |
<null> |
0x007 |
0x0000 (0x00) |
<null> |
attr | wchar (char) | symbol |
---|---|---|
0x029 |
0x0051 (0x51) |
Q |
0x129 |
0x0082 (0x82) |
Hiragana I Shift-JIS Codepage 932 Lead Byte |
0x229 |
0x00A2 (0xA2) |
Hiragana I Shift-JIS Codepage 932 Trail Byte |
0x129 |
0x0082 (0x82) |
Hiragana KA Shift-JIS Codepage 932 Lead Byte |
0x229 |
0x00A9 (0xA9) |
Hiragana KA Shift-JIS Codepage 932 Trail Byte |
0x129 |
0x0082 (0x82) |
Hiragana NA Shift-JIS Codepage 932 Lead Byte |
0x229 |
0x00C8 (0xC8) |
Hiragana NA Shift-JIS Codepage 932 Trail Byte |
0x029 |
0x005A (0x5A) |
Z |
0x029 |
0x0059 (0x59) |
Y |
0x029 |
0x0058 (0x58) |
X |
0x029 |
0x0057 (0x57) |
W |
0x029 |
0x0056 (0x56) |
V |
0x007 |
0x0020 (0x20) |
<space> |
0x007 |
0x0020 (0x20) |
<space> |
0x007 |
0x0020 (0x20) |
<space> |
0x007 |
0x0020 (0x20) |
<space> |
attr | wchar (char) | symbol |
---|---|---|
0x029 |
0x0000 (0x00) |
<null> |
0x029 |
0x0000 (0x00) |
<null> |
0x029 |
0x0000 (0x00) |
<null> |
0x029 |
0x0000 (0x00) |
<null> |
0x029 |
0x0000 (0x00) |
<null> |
0x029 |
0x0000 (0x00) |
<null> |
0x029 |
0x0000 (0x00) |
<null> |
0x029 |
0x0000 (0x00) |
<null> |
0x029 |
0x0000 (0x00) |
<null> |
0x029 |
0x0000 (0x00) |
<null> |
0x029 |
0x0000 (0x00) |
<null> |
0x029 |
0x0000 (0x00) |
<null> |
0x007 |
0x0000 (0x00) |
<null> |
0x007 |
0x0000 (0x00) |
<null> |
0x007 |
0x0000 (0x00) |
<null> |
0x007 |
0x0000 (0x00) |
<null> |
Just a quick note as confirmation: The original bug in the API discrepancy has been fixed in the last handful of Windows Insiders builds. This was MSFT: 10187355.
We then made a copy of it to port it back from the Windows Insider builds to the Creators Update as MSFT: 11721571. I've heard that the Creators Update KB fix has apparently just gone live in KB4020102 (OS build number 15063.332).
@miniksa Thanks for your detailed reply, and for fixing the issue in Windows.
It wasn't completely clear to me, but I think in the above patterns, for WriteConsoleOutput*
, the write region is always 1 row and has as many columns as the string length? I'm guessing the written attribute also never includes 0x100 or 0x200. So, for WriteConsoleOutputA
, the write region would be of size 16x1, but for WriteConsoleOutputW
, the size would be 12x1. Is that right?
winpty always currently reads using ReadConsoleOutputW
, and it configures a TrueType font according to the terminal size and console code page (Lucida Console by default, MS Gothic (CP932), NSimSun (CP936), Gulim Che (CP949), or Ming Light (CP950)). So far, I've just assumed that applications won't configure the console font themselves. Given these constraints, it looks like pattern 5 is the one winpty should mostly care about.
I think winpty could accommodate a raster font by converting the ReadConsoleOutputW
CHAR_INFO
buffer from pattern 3 to 5. I'm imagining it could have a built-in table indicating which code points occupied two cells. IIRC, when I last looked at raster fonts, the console converted any WCHAR it didn't recognize into a question mark, so I'm not inclined to use raster fonts.
Patterns 1 and 2 seem like misuses of the console API. Pattern 1 is placing wide characters into single cells, which will tend to cause alignment problems (e.g. an 80 column line with 160 columns of text in it). If my assumption above about write regions is correct, then pattern 2 is providing extraneous CHAR_INFO
data. The WriteConsoleOutputW
call ought to specify a 16x1 region (not 12x1) and put four nulls at the end. i.e. The write buffer should be identical to the buffer in pattern 3.
The 0xFFFF invalid character in patterns 4 and 8 is weird; I'm guessing it's there for backwards compatibility? winpty doesn't currently handle it, but I guess it should. It currently expects the trailing and leading wide char to equal.
FWIW, winpty also tries to recognize UTF-16 surrogate pairs, though I'm not sure that was a good idea in retrospect. I wonder what happens with the 65001 codepage combined with WriteConsoleA
of a character outside the BMP.
If you want to document more hairiness, I'm sure there are edge cases involving I/O to a single cell of a two-cell character. I think ReadConsoleOutputW
will read a space?
I just updated my 15063 VM tonight, and the KB4020102 update automatically installed, which bumped the winver
up from 15063.296 to 15063.332. The output from winbug-15048.exe
looks good now. I think I'm less inclined now to implement a workaround.
(Note to self: the patterns describe the fixed state, not the buggy state.)
@rprichard Apologies. I should have specified more about how the test writes.
For all writes, we're starting with a cleared out buffer (all set to space characters 0x20
and the default background color 0x7
.) We also always write to the 0,0 position in this test.
Then from there, WriteConsoleOutput*
allocates CHAR_INFO
array that is the length of the string. For W-versions, this is the wcslen()
length of the original string. For A-versions, this is the length of the result from calling MultiByteToWideChar
on the original W-string. Each CHAR_INFO
is filled with a character from the string (wide or narrow portion of the unions respectively) and the applied attribute 0x29
to change the color. The lead/trailing flags 0x100
and 0x200
are not set on write. This means that for the W-string we are writing 12 CHAR_INFO
s and for the A-string we are writing 16 CHAR_INFO
s. 12x1 and 16x1 as you stated for the write region are correct.
For the CRT write tests, we clear the buffer the same way and set the cursor to 0,0 again. For writing W versions, we call _setmode(_fileno(stdout), _O_WTEXT)
to make sure the CRT doesn't try to be helpful and convert our text then use putwchar
to put each character in a loop. For the A version, we call _setmode(_fileno(stdout), _O_TEXT)
and use putchar
. By default, the CRT is in _O_TEXT mode and will convert anything you write with putwchar or wprintf on your behalf back into A text before sending to the console, so setting the mode is important to maintain the integrity of the bytes being emitted.
For the WriteConsole
and WriteConsoleOutputCharacter*
writes, we set the target to 0,0 and pass the entire string, either the 12-length W-string or the 16-length A-string (post WC2MB
conversion). We don't try to set the colors in these modes during the test patterns.
Then regarding raster fonts, you are correct. If the character doesn't exist in the currently selected raster font, it will generally convert it into the default character ?
. Raster fonts typically only have a very small subset of characters represented, so there will be lots of ?
. For TrueType fonts, the character is typically maintained in the buffer without respect to whether the font can actually draw it.
One thing to note that I think you may have misunderstood: all of these patterns are what you will see when attempting to read back 16 characters no matter what write mechanism was used. The write mechanisms vary as specified above. But reading back 16 items from the Read APIs will result in these patterns. To that end, Patterns 1 and 2 are representing a Read back, not what was specified on Write. The write buffer did indeed look like Pattern 3 (but 12 long instead of 16) for writing the W version of the text with the WriteConsoleOutput
API, but the write wasn't what I was intending to describe/convey with the patterns.
The 0xFFFF
invalid character in patterns 4 and 8 is actually a bug that leaked out through the API and now is maintained for compatibility. The origin of it is that the console historically used like 3 different independent mechanisms internally to recognize the column width of any character that organically developed over time. Each developer came in and added their own without context and so it went. The internal console buffer was always stored in the codepage that was going to be used for display. In one of these forms, one of the developers decided to put 0xFFFF
in the trail to know that it would take 2 columns/bytes (treated interchangeably even though that's not strictly true). A different developer used the 0x100
and 0x200
flags (also as an interchangeable metric of column width and bytes...). And so on.
This sort of organic development without context over time is also how we ended up with some patterns like 8 and 9 with an A byte stomped on top of a W character in the CHAR_INFO structure on read...
In the last few years, I rearranged this so the buffer internally is always stored as Unicode text and it is translated as necessary on the way in/out through the APIs and when being given to GDI. It also always uses the 0x100
and 0x200
flags to know the column width and uses MB2WC
or WC2MB
whenever it needs the byte count. This makes for a lot less internal code complexity to figure out what form the buffer is currently stored in. However, the compatibility police came after me and said I had to maintain the API surface, so there's a function to re-munge the trailing byte to an 0xFFFF
on the way out when certain states exist. So you'll have to expect/deal with that. :(
The windows console doesn't support UTF-16 surrogate pairs right now. I want to do that in the future, but it's more accurate to say we support UCS-2 than to say we support UTF-16.
UTF-8 (codepage 65001 on the A APIs) is also not officially supported. It works some times and on some of the APIs, but it's not complete and there are gaps/holes. I would expect it to work in strange and interesting ways.
Officially, the console supports 2 byte UCS-2 through the W versions of the API. On the A API, we support code pages that are 1 byte for the "Western" world and we support 4 specific 2 byte codepages for "CJK" regions: 932, 936, 949, and 950. Anything else was never officially implemented or supported and your mileage may vary significantly.
Do you have a specific example of what you mean by your single cell I/O of a two-cell character? I can log a bug/task internally to investigate, test, and further document that. This is basically all happening on-demand as we discover scenarios/problems. So if you have a specific scenario/problem, please let me know!
OK cool. If you don't need to implement a workaround, even better. Hopefully this still provides some good insight into what's happening and why and how for future reference.
Thanks for your patience and cooperation! --Michael
@miniksa Sorry, I forgot about this.
Do you have a specific example of what you mean by your single cell I/O of a two-cell character? I can log a bug/task internally to investigate, test, and further document that. This is basically all happening on-demand as we discover scenarios/problems. So if you have a specific scenario/problem, please let me know!
I was thinking of things like:
ReadConsoleOutputW
to read only the first or second column (but not both).WriteConsole[AW]
, WriteConsoleOutput[AW]
, WriteFile
, etc.).IIRC, in scenario 1, the console will pretend to read a space (U+0020) character. In scenario 2, I think it will replace the other half of the two-column character(s) with a space. I wouldn't be surprised if the answer is more complicated. :-P
For scenario 3, I'm guessing the last column is replaced with a space, and the character is wrapped around to the next line? What if the screen buffer is only 1 column wide (which also implies a gigantic font)?
For synchronization purposes, winpty issues a single ReadConsoleOutputW
call for the entire first column of the buffer, and if it issues a read for N lines, it expects to find exactly N CHAR_INFO
records in its buffer. It only cares about lines that have a "sync marker" in them. Otherwise, winpty always reads (or clears) whole lines at once.
A change to the console in new versions of Windows 10 (e.g. 15048, but not 15014), breaks winpty by effectively shifting cells at the start of a line to the end of the previous line when certain characters appear, in certain fonts.
This winpty issue caused this downstream issue in VSCode, https://github.com/Microsoft/vscode/issues/19665.
Follow these links for more details: