ReadConsoleOutputW doesn't work with surrogate pairs (the buffer layers thread)

Windows Terminal version (or Windows build number)

10.0.19043.1110

Other Software

No response

Steps to reproduce

Compile and run the following code:

#include <iostream>

#include <string>
#include <windows.h>

int main()
{
    const auto Out = GetStdHandle(STD_OUTPUT_HANDLE);
    const std::wstring Text = L"[👨👩👧👦]";

    system("cls");

    DWORD n;
    WriteConsoleW(Out, Text.data(), Text.size(), &n, {});
    CHAR_INFO Buffer[10];

    const COORD BufferCoord{ 10, 1 };

    SMALL_RECT ReadRegion{ 0, 0, 9, 0 };
    ReadConsoleOutputW(Out, Buffer, BufferCoord, {}, &ReadRegion);

    SMALL_RECT WriteRegion{ 0, 2, 9, 2 };

    WriteConsoleOutputW(Out, Buffer, BufferCoord, {}, &WriteRegion);

    int In;
    std::cin >> In;
}

Expected Behavior

Actual Behavior

Observations

The problem is here: https://github.com/microsoft/terminal/blob/3f5f37d910a2c8fde3bb5123c9516f5f9e38af2b/src/host/directio.cpp#L848

AsCharInfo calls Utf16ToUcs2 for the same string over and over: https://github.com/microsoft/terminal/blob/3f5f37d910a2c8fde3bb5123c9516f5f9e38af2b/src/host/consoleInformation.cpp#L408

which returns UNICODE_REPLACEMENT: https://github.com/microsoft/terminal/blob/3f5f37d910a2c8fde3bb5123c9516f5f9e38af2b/src/types/convert.cpp#L172

Additionally, AsCharInfo incorrectly sets COMMON_LVB_LEADING_BYTE in this case: https://github.com/microsoft/terminal/blob/3f5f37d910a2c8fde3bb5123c9516f5f9e38af2b/src/host/consoleInformation.cpp#L415

I assume that the host might use Attribute::Leading and Attribute::Trailing internally for its own purposes, but COMMON_LVB_LEADING_BYTE & COMMON_LVB_TRAILING_BYTE have a different purpose, not applicable in this case and shouldn't leak into the public interfaces for surrogates.

A possible fix

diff --git a/src/host/directio.cpp b/src/host/directio.cpp
index 760a7ecf..4ed1ab0d 100644
--- a/src/host/directio.cpp
+++ b/src/host/directio.cpp
@@ -844,21 +844,34 @@ void EventsToUnicode(_Inout_ std::deque<std::unique_ptr<IInputEvent>>& inEvents,
             // If the point we're trying to write is inside the limited buffer write zone...
             if (targetLimit.IsInBounds(targetPos))
             {
+                auto charInfo = gci.AsCharInfo(*sourceIter);
+
+                if (sourceIter->Chars().size() > 1)
+                    charInfo.Attributes &= ~COMMON_LVB_LEADING_BYTE;
+
                 // Copy the data into position...
-                *targetIter = gci.AsCharInfo(*sourceIter);
-                // ... and advance the read iterator.
-                sourceIter++;
-            }
+                for (const auto i: sourceIter->Chars())
+                {
+                    targetIter->Attributes = charInfo.Attributes;
+                    targetIter->Char.UnicodeChar = i;

-            // Always advance the write iterator, we might have skipped it due to clipping.
-            targetIter++;
+                    // Always advance the write iterator, we might have skipped it due to clipping.
+                    ++targetIter;

-            // Increment the target
-            targetPos.X++;
-            if (targetPos.X >= targetSize.X)
-            {
-                targetPos.X = 0;
-                targetPos.Y++;
+                    // Increment the target
+                    ++targetPos.X;
+                    if (targetPos.X >= targetSize.X)
+                    {
+                        targetPos.X = 0;
+                        ++targetPos.Y;
+                    }
+
+                    // ... and advance the read iterator.
+                    sourceIter++;
+
+                    if (!targetLimit.IsInBounds(targetPos))
+                        break;
+                }
             }
         }

P.S. I remember that there's #8000 that should improve the situation with surrogates in general. This one is about the API issue: when a surrogate pair occupies two columns, there's really no reason to return two replacement characters if we could return the original pair. The situation with a pair that occupies one column is less obvious, but that's a different issue.

@miniksa for the response

Okay, so here's what I said during triage that the team told me to put here:

For the most part, we don't want to continue improving the ReadConsoleOutput* family of functions. We believe that the "readback" scenario, being not translatable into VT sequences, will inhibit us from our long-term goals for command-line applications and hosts/terminals on Windows. Those goals primarily center around moving the Windows platform toward as much of VT sequences as possible (translating older apps when necessary for compatibility purposes on their behalf) and systematically trying to cut out the designs of the legacy conhost platform that prevent us from achieving our performance and reliability goals (like the services conhost was providing to Windows CLIs such as automatic history, aliases, and mutable buffers).
We purposefully are giving replacement characters out of this API because this API as a whole leaked the implementation detail of the CHAR_INFO as a single grid position with a fixed-size character data and fixed-size color data. Of course as we've advanced the platform in the last bunch of years, we've realized that both of those constraints are unsuitable for modern terminal applications (due to surrogates or complex unicode combining sequences not fitting in a single cell on the character side and full RGB color data not fitting in a fixed-size WORD attribute).
Our goal therefore is to not regress this particular API for any of the functionality that had existed and worked in the past, as to continue to support legacy applications that used it. But to also not really extend it to provide any further functionality for modern and currently-developed applications to become even more dependent on its quirks.
In order to make this particular API work, we would have to develop some sort of wink-nudge scheme with a lengthy Remarks section on docs.microsoft.com that explained how a 80-cell wide buffer that used to be ReadConsoleOutputW-able by allocating 80*sizeof(CHAR_INFO) and passing it in... may now have to detect something and give us MORE than 80 cells to account for surrogate pairs. Or read some overloaded indicator to then go read more data another way. Or write a ReadConsoleOutputExW method which wouldn't be backported downlevel.
I would be more inclined to provide some affordance to the ReadConsoleOutputCharacter API. I think we could pretty safely say something like "When in Codepage 65001 for UTF-8... (which was never supported properly anyway in the grand history of things...), use this API in a double-call format. Provide a null buffer pointer on lpCharacter and tell us the start position in dwReadCoord and the number of cells in nLength that you wish to read. In lpNumberOfCharsRead, we will tell you how large the buffer will need to be to hold all the UTF-8 characters. Then call again and we'll fill it up. That doesn't really help you map it to cells yourself, but it's definitely less complicated for us to change and doesn't require publishing new API surfaces or structures (which is a big challenge especially on something we're trying to get away from.)
I cannot really provide you some sort of timeline in which I would solve it this way. It could perhaps be a community contribution to the API call surface given conhost is in this repository.

Thank you for the explanation, I agree that it's a PITA from a dev perspective.

The ReadConsoleOutputCharacter API doesn't sound like a viable alternative for our workflow: we need not only the text, but the whole picture: text, its positioning, colours, style attributes etc.

We need it to provide a terminal session look & feel inside a file manager. To see what I mean:

Get Far
Type and execute any command
Hide both panels (Ctrl+O) and inspect the command output
Press the same key again to restore panels, type something else
Reduce the left panel height by making it active and pressing Ctrl+Shift+Up several times
Execute something else; now you can see the tail of the output without any extra effort and still can check the rest of it if needed:

Obviously, to be able to bring back that output by demand we need to read it from the console once the command finishes executing. Once we have that snapshot we can draw the UI controls (e.g. panels) on top of it and continue. Later, if the user wishes to see it, we flush the saved output back to console. Once the user executes something else we take a new snapshot.

This design pattern (instant access to the output) has been around since Norton Commander / the 80s and is ridiculously convenient.

Now compare it with a system that historically doesn't have any "readback" notion:

Get MC
Type and execute any command
Hide both panels (Ctrl+O) and inspect the command output
...That's all. The integration ends here:

Without readback you can have either the UI or the terminal, but not both simultaneously. Of course, it's not the end of the world, it's still usable, but way less convenient.

What can be done:

As I mentioned earlier, we don't really care what ReadConsoleOutput returns: we don't do anything with those CHAR_INFOs, only save all of them and write back later on demand. Given that, the ideal API for this particular scenario would've probably been:

HANDLE TakeShapshot(SMALL_RECT Region) - take a full snapshot of the screen buffer and return an opaque handle to it. It would be ideal if it could work with an arbitrary Region within the buffer, including the scrollback, but the viewport only is fine if that's not possible for some reason.
void PutSnapshot(HANDLE Snapshot, COORD Position) - write the previously taken snapshot back to the buffer, cropping if needed (e.g. if the user changed the buffer size between the calls). Same as above - it would be ideal if it could put it into any Position within the buffer, including the scrollback, but the viewport only is fine if that's not possible for some reason.
void DeleteSnapshot(HANDLE Snapshot)- delete the snapshot that is no longer needed.

Alternatively, TakeShapshot / PutSnapshot / DeleteSnapshot concepts could be ESC sequences if that's easier to implement (no need to go through kernel32 etc.).

With this approach it would be possible not only to keep the existing functionality, but also to support all the existing and future extensions (true color, text styles, surrogates etc.) without exposing any new structures or any implementation details.

Aha. That is interesting now to better understand the scenario.

One of the troubles right now I'm having with the "passthrough" mode that I made during our fix-hack-learn week a few weeks ago is that the "popups" that occur in a "Cooked Read" (like when CMD.exe asks conhost to do the edit line on its behalf... as opposed to Powershell that does it itself with the raw reads. Press the F keys in CMD.exe after you've run a few commands to see a popup in progress). We have the exact same problem there. We want to backup a region of the screen, draw over the top of it, and then dismiss it and have what was behind it get restored. I wonder if there's anything in https://invisible-island.net/xterm/ctlseqs/ctlseqs.html that would help us here if we implemented it.

When I briefly discussed this particular problem of having an "overlay" sort of layer that can be non-destructively dismissed... We were thinking it is sort of like requesting a move to an "alternate screen buffer" except one where we copy the main buffer to the alternate on entry, destructively draw over the alternate, then pop back to the main one when the "overlay" is complete. I'm not sure there is a VT sequence for "copy main into alternate while entering" but we should perhaps dig around and see if we can find anything close to this paradigm and if not, engage with some of the other terminals out there to see either how they handle this or if we can all agree on a reasonable extension.

The cropping if the user resizes is also interesting as well. I'm not sure that my solution immediately covers that.

Yes, it's the same idea as those F2/F4/F7/F9 popups.

Also, TakeShapshot / PutSnapshot / DeleteSnapshot relatively nicely fit into the existing API:

// Take a snapshot of the current viewport
HANDLE WINAPI CreateConsoleScreenBuffer(
  DWORD               DesiredAccess,     // Ignore
  DWORD               ShareMode,         // Ignore
  const SECURITY_ATTRIBUTES *Attributes, // Ignore
  DWORD               Flags,             // CONSOLE_SNAPSHOT_BUFFER
  LPVOID              ScreenBufferData   // Ignore
);

// Write the taken snapshot into the current viewport
BOOL WINAPI SetConsoleActiveScreenBuffer(
  HANDLE hConsoleOutput // A handle to a CONSOLE_SNAPSHOT_BUFFER
);

// Discard the allocated snapshot
BOOL CloseHandle(
  HANDLE hObject // A handle to a CONSOLE_SNAPSHOT_BUFFER
);

The external impact is one new SDK constant for the new buffer type.

Yes, it's the same idea as those F2/F4/F7/F9 popups.

Also, TakeShapshot / PutSnapshot / DeleteSnapshot relatively nicely fit into the existing API:

// Take a snapshot of the current viewport
HANDLE WINAPI CreateConsoleScreenBuffer(
  DWORD               DesiredAccess,     // Ignore
  DWORD               ShareMode,         // Ignore
  const SECURITY_ATTRIBUTES *Attributes, // Ignore
  DWORD               Flags,             // CONSOLE_SNAPSHOT_BUFFER
  LPVOID              ScreenBufferData   // Ignore
);

// Write the taken snapshot into the current viewport
BOOL WINAPI SetConsoleActiveScreenBuffer(
  HANDLE hConsoleOutput // A handle to a CONSOLE_SNAPSHOT_BUFFER
);

// Discard the allocated snapshot
BOOL CloseHandle(
  HANDLE hObject // A handle to a CONSOLE_SNAPSHOT_BUFFER
);

The external impact is one new SDK constant for the new buffer type.

Dude. I love this. This sounds completely doable. Adding one constant to an existing flag field and not changing the method signatures at all is the level of risk I'll happily take with the existing API surface. We would just also need to figure out how to represent this as VT or make it analogous to VT such that we could co-exist with other Terminals and the Linux world and I think we have a spec.

We want to backup a region of the screen, draw over the top of it, and then dismiss it and have what was behind it get restored. I wonder if there's anything in https://invisible-island.net/xterm/ctlseqs/ctlseqs.html that would help us here if we implemented it.

FYI, there is actually a way to read the screen to some extent using VT sequences - it's just very inconvenient, and for those terminals that do support it, it's often disabled because it's considered a security risk.

Essentially what you do is use the DECRQCRA command to request a checksum of the screen, but limit it to one cell at a time. I haven't really played with this much, so I'm not sure to what extent it works with Unicode, but in theory you should be able to determine the underlying character from the returned checksum.

As for reading attributes, XTerm added the XTCHECKSUM sequence to configure the form of checksum that DECRQCRA produces, and I think that enables it to include the cell attributes in the checksum, which in theory would let you to determine the attributes of a given cell (again I haven't actually tested this).

Obviously you can see this wouldn't be a very efficient way to read the entire screen, so I'm not sure it's a practical solution. But I thought it was worth mentioning, because if DECRQCRA is considered a security risk, then assumedly any other sequence we invent to read the screen would be just as risky.

Yeah @j4james... I agree both on the efficiency and security points. That's why I'm excited to understand the scenario as a sort of "keep a backup for me, let me draw on top as an overlay, then restore it." I think the better route to follow is to allow for an alternate screen buffer that copies the main one on entry so it can be destructively painted then restored by returning to the main one.

tl;dr for folks coming to the thread late:

Originally, the thread was started from the point of "ReadConsoleOutputW doesn't work with surrogate pairs"
The way the console API is designed today, it's really not possible ever to read full unicode out of the buffer. That doesn't work today, even before we do the buffer rewrite (as planned in #8000).
Making changes (additions) to the console API is VERY hard. Making changes to the VT implementation is pretty trivial.
Far Manager uses ReadConsoleOutput to support "overlays", where it draws its UI on top of the commandline output.

We discussed this a bit more yesterday. The "create a screen buffer which is a copy of the current one" seemed like a good consensus. Whether that's with CreateConsoleScreenBuffer(..., CONSOLE_SNAPSHOT_BUFFER, ...)/SetConsoleActiveScreenBuffer or some custom VT sequence for "enter the alternate buffer (and initialize the contents of the alt buffer to the contents of the main buffer)" seemed like a minor semantic difference. Maybe something like \x1b[?9049h/l, since we've already used ?9001h/l in the past. I honestly don't know which would be easier to prototype, I know I'd probably start with the VT version, if that's something that would work for this.

If that's something that might work for Far, we can try and prototype something here.

@zadjii-msft Thanks for getting back to this.

To summarise, this is what we do now:

                                     ┌─────┐
                                     │Start│
                                     └──┬──┘
   ┌───────────────────────────────────►┤
   │                                    ▼
   │                          ┌─────────┴─────────┐
   │                          │  Take a snapshot  │
   │                          │  of the viewport  │
   │                          │(ReadConsoleOutput)│
   │                          └─────────┬─────────┘
   │                                    ▼
   │                              ┌─────┴─────┐
   │                              │Draw the UI│
   │                              └─────┬─────┘
   │                                    ├◄─────────────────────────────────┐
   │                                    ▼                                  │
   │                          ┌─────────┴─────────┐                        │
   │               ┌──────────┤Wait for user input├────────┐               │
   │               │          └───────────────────┘        │               │
   │               ▼                                       ▼               │
   │    ┌──────────┴──────────┐            ┌───────────────┴─────────────┐ │
   │    │User enters a command│            │User wants to see (a part of)│ │
   │    └──────────┬──────────┘            │   the output under the UI   │ │
   │               │                       └───────────────┬─────────────┘ │
   │               ▼                                       ▼               │
   │ ┌─────────────┴────────────┐            ┌─────────────┴──────────┐    │
   │ │ Write the saved snapshot │            │Rearrange the UI layers │    │
   │ │      to the console      │            │   internally so that   │    │
   │ │    Execute the process   │            │     the snapshot is    │    │
   │ │The process writes output │            │   (partially) visible  │    │
   │ │ to the console and exits │            │ Write the merged image │    │
   │ └─────────────┬────────────┘            │    to the console      │    │
   │               │                         └──────────┬─────────────┘    │
   │               │                                    │                  │
   └───────────────┘                                    └──────────────────┘

This worked nice for decades, but now those CHAR_INFOs can't represent all the information on the screen (because of new colours and text styles) and also there are issues with surrogates like the described above, so reading and writing them back is no longer lossless.

With opaque "snapshots":

                                     ┌─────┐
                                     │Start│
                                     └──┬──┘
   ┌───────────────────────────────────►┤
   │                                    ▼
   │                          ┌─────────┴─────────┐
   │                          │  Take a snapshot  │
   │                          │  of the viewport  │
   │                          │(opaque, VT or API)│
   │                          └─────────┬─────────┘
   │                                    ▼
   │                              ┌─────┴─────┐
   │                              │Draw the UI│
   │                              └─────┬─────┘
   │                                    ├◄─────────────────────────────────┐
   │                                    ▼                                  │
   │                          ┌─────────┴─────────┐                        │
   │               ┌──────────┤Wait for user input├────────┐               │
   │               │          └───────────────────┘        │               │
   │               ▼                                       ▼               │
   │    ┌──────────┴──────────┐            ┌───────────────┴─────────────┐ │
   │    │User enters a command│            │User wants to see (a part of)│ │
   │    └──────────┬──────────┘            │   the output under the UI   │ │
   │               │                       └───────────────┬─────────────┘ │
   │               ▼                                       ▼               │
   │ ┌─────────────┴────────────┐          ┌───────────────┴────────────┐  │
   │ │Restore the saved snapshot│          │ Restore the saved snapshot │  │
   │ │    (opaque, VT or API)   │          │     (opaque, VT or API)    │  │
   │ │    Execute the process   │          │Write UI layers on top of it│  │
   │ │The process writes output │          │   but only where needed    │  │
   │ │ to the console and exits │          └───────────────┬────────────┘  │
   │ └─────────────┬────────────┘                          └───────────────┘
   │               │
   └───────────────┘

Essentially it's almost the same workflow, except for the bottom right corner, describing the case when we want to display both the output and the UI simultaneously (see the screenshot above). It is a little more complicated than the other branch because currently we don't write anything to the console directly, but into a CHAR_INFO-ish memory buffer instead and then flush it when needed, not caring what is inside. With the new approach we will need to do some extra work to preserve the (parts of) the snapshot we've just restored and write only precisely where needed, but that's not rocket science and I'm happy to try it.

I was just doing some research on the rectangular editing operations (DECCRA, DECFRA, DECERA, etc.), and it occurred to me that these functions may provide a solution to the layering problem when combined with the paging APIs discussed in #13892.

The key point to note is that the Copy Rectangular Area operation (DECCRA) takes a source and destination page number. So if you want to save an area of the screen, and restore it later, you should just be able to copy it to a background page, and then copy it back again when you want it restored.

I need to do some testing on terminals that actually implement these operations to see if they work the way I think they do, but they sound promising.

I've had a chance to test this now on a couple of different terminal emulators, and most work really well. MLTerm had issues restoring the "blank" parts of the screen, if you're tried to use an overlay immediately after startup, but once the first page had scrolled out of view it seemed to work fine.

Here's a short demo from one of the terminals I tested.

https://user-images.githubusercontent.com/4181424/189869500-075cd840-9c4e-43b0-ae46-2ae3c118913f.mp4

Wow that's a GREAT solution. That sounds like the perfect way to save off a section of the buffer and restore it. I suppose some subprocess could hijack the page that you stashed buffer contents into, but meh? That seems like something that's worth at least exploring more. Seems like a way easier way to try to do overlays in a shell.

microsoft / terminal