microsoft / terminal

The new Windows Terminal and the original Windows console host, all in the same place!
MIT License
95.87k stars 8.34k forks source link

ReadConsoleOutputW doesn't work with surrogate pairs (the buffer layers thread) #10810

Open alabuzhev opened 3 years ago

alabuzhev commented 3 years ago

Windows Terminal version (or Windows build number)

10.0.19043.1110

Other Software

No response

Steps to reproduce

Compile and run the following code:

#include <iostream>

#include <string>
#include <windows.h>

int main()
{
    const auto Out = GetStdHandle(STD_OUTPUT_HANDLE);
    const std::wstring Text = L"[πŸ‘¨πŸ‘©πŸ‘§πŸ‘¦]";

    system("cls");

    DWORD n;
    WriteConsoleW(Out, Text.data(), Text.size(), &n, {});
    CHAR_INFO Buffer[10];

    const COORD BufferCoord{ 10, 1 };

    SMALL_RECT ReadRegion{ 0, 0, 9, 0 };
    ReadConsoleOutputW(Out, Buffer, BufferCoord, {}, &ReadRegion);

    SMALL_RECT WriteRegion{ 0, 2, 9, 2 };

    WriteConsoleOutputW(Out, Buffer, BufferCoord, {}, &WriteRegion);

    int In;
    std::cin >> In;
}

Expected Behavior

image

Actual Behavior

image

Observations

The problem is here: https://github.com/microsoft/terminal/blob/3f5f37d910a2c8fde3bb5123c9516f5f9e38af2b/src/host/directio.cpp#L848

AsCharInfo calls Utf16ToUcs2 for the same string over and over: https://github.com/microsoft/terminal/blob/3f5f37d910a2c8fde3bb5123c9516f5f9e38af2b/src/host/consoleInformation.cpp#L408

which returns UNICODE_REPLACEMENT: https://github.com/microsoft/terminal/blob/3f5f37d910a2c8fde3bb5123c9516f5f9e38af2b/src/types/convert.cpp#L172

Additionally, AsCharInfo incorrectly sets COMMON_LVB_LEADING_BYTE in this case: https://github.com/microsoft/terminal/blob/3f5f37d910a2c8fde3bb5123c9516f5f9e38af2b/src/host/consoleInformation.cpp#L415

I assume that the host might use Attribute::Leading and Attribute::Trailing internally for its own purposes, but COMMON_LVB_LEADING_BYTE & COMMON_LVB_TRAILING_BYTE have a different purpose, not applicable in this case and shouldn't leak into the public interfaces for surrogates.

A possible fix

diff --git a/src/host/directio.cpp b/src/host/directio.cpp
index 760a7ecf..4ed1ab0d 100644
--- a/src/host/directio.cpp
+++ b/src/host/directio.cpp
@@ -844,21 +844,34 @@ void EventsToUnicode(_Inout_ std::deque<std::unique_ptr<IInputEvent>>& inEvents,
             // If the point we're trying to write is inside the limited buffer write zone...
             if (targetLimit.IsInBounds(targetPos))
             {
+                auto charInfo = gci.AsCharInfo(*sourceIter);
+
+                if (sourceIter->Chars().size() > 1)
+                    charInfo.Attributes &= ~COMMON_LVB_LEADING_BYTE;
+
                 // Copy the data into position...
-                *targetIter = gci.AsCharInfo(*sourceIter);
-                // ... and advance the read iterator.
-                sourceIter++;
-            }
+                for (const auto i: sourceIter->Chars())
+                {
+                    targetIter->Attributes = charInfo.Attributes;
+                    targetIter->Char.UnicodeChar = i;

-            // Always advance the write iterator, we might have skipped it due to clipping.
-            targetIter++;
+                    // Always advance the write iterator, we might have skipped it due to clipping.
+                    ++targetIter;

-            // Increment the target
-            targetPos.X++;
-            if (targetPos.X >= targetSize.X)
-            {
-                targetPos.X = 0;
-                targetPos.Y++;
+                    // Increment the target
+                    ++targetPos.X;
+                    if (targetPos.X >= targetSize.X)
+                    {
+                        targetPos.X = 0;
+                        ++targetPos.Y;
+                    }
+
+                    // ... and advance the read iterator.
+                    sourceIter++;
+
+                    if (!targetLimit.IsInBounds(targetPos))
+                        break;
+                }
             }
         }

P.S. I remember that there's #8000 that should improve the situation with surrogates in general. This one is about the API issue: when a surrogate pair occupies two columns, there's really no reason to return two replacement characters if we could return the original pair. The situation with a pair that occupies one column is less obvious, but that's a different issue.

zadjii-msft commented 3 years ago

@miniksa for the response

miniksa commented 3 years ago

Okay, so here's what I said during triage that the team told me to put here:

alabuzhev commented 3 years ago

Thank you for the explanation, I agree that it's a PITA from a dev perspective.

The ReadConsoleOutputCharacter API doesn't sound like a viable alternative for our workflow: we need not only the text, but the whole picture: text, its positioning, colours, style attributes etc.

We need it to provide a terminal session look & feel inside a file manager. To see what I mean:

image

Obviously, to be able to bring back that output by demand we need to read it from the console once the command finishes executing. Once we have that snapshot we can draw the UI controls (e.g. panels) on top of it and continue. Later, if the user wishes to see it, we flush the saved output back to console. Once the user executes something else we take a new snapshot.

This design pattern (instant access to the output) has been around since Norton Commander / the 80s and is ridiculously convenient.

Now compare it with a system that historically doesn't have any "readback" notion:

image

Without readback you can have either the UI or the terminal, but not both simultaneously. Of course, it's not the end of the world, it's still usable, but way less convenient.

What can be done:

As I mentioned earlier, we don't really care what ReadConsoleOutput returns: we don't do anything with those CHAR_INFOs, only save all of them and write back later on demand. Given that, the ideal API for this particular scenario would've probably been:

Alternatively, TakeShapshot / PutSnapshot / DeleteSnapshot concepts could be ESC sequences if that's easier to implement (no need to go through kernel32 etc.).

With this approach it would be possible not only to keep the existing functionality, but also to support all the existing and future extensions (true color, text styles, surrogates etc.) without exposing any new structures or any implementation details.

miniksa commented 3 years ago

Aha. That is interesting now to better understand the scenario.

One of the troubles right now I'm having with the "passthrough" mode that I made during our fix-hack-learn week a few weeks ago is that the "popups" that occur in a "Cooked Read" (like when CMD.exe asks conhost to do the edit line on its behalf... as opposed to Powershell that does it itself with the raw reads. Press the F keys in CMD.exe after you've run a few commands to see a popup in progress). We have the exact same problem there. We want to backup a region of the screen, draw over the top of it, and then dismiss it and have what was behind it get restored. I wonder if there's anything in https://invisible-island.net/xterm/ctlseqs/ctlseqs.html that would help us here if we implemented it.

When I briefly discussed this particular problem of having an "overlay" sort of layer that can be non-destructively dismissed... We were thinking it is sort of like requesting a move to an "alternate screen buffer" except one where we copy the main buffer to the alternate on entry, destructively draw over the alternate, then pop back to the main one when the "overlay" is complete. I'm not sure there is a VT sequence for "copy main into alternate while entering" but we should perhaps dig around and see if we can find anything close to this paradigm and if not, engage with some of the other terminals out there to see either how they handle this or if we can all agree on a reasonable extension.

The cropping if the user resizes is also interesting as well. I'm not sure that my solution immediately covers that.

alabuzhev commented 3 years ago

Yes, it's the same idea as those F2/F4/F7/F9 popups.

Also, TakeShapshot / PutSnapshot / DeleteSnapshot relatively nicely fit into the existing API:

// Take a snapshot of the current viewport
HANDLE WINAPI CreateConsoleScreenBuffer(
  DWORD               DesiredAccess,     // Ignore
  DWORD               ShareMode,         // Ignore
  const SECURITY_ATTRIBUTES *Attributes, // Ignore
  DWORD               Flags,             // CONSOLE_SNAPSHOT_BUFFER
  LPVOID              ScreenBufferData   // Ignore
);
// Write the taken snapshot into the current viewport
BOOL WINAPI SetConsoleActiveScreenBuffer(
  HANDLE hConsoleOutput // A handle to a CONSOLE_SNAPSHOT_BUFFER
);
// Discard the allocated snapshot
BOOL CloseHandle(
  HANDLE hObject // A handle to a CONSOLE_SNAPSHOT_BUFFER
);
miniksa commented 3 years ago

Yes, it's the same idea as those F2/F4/F7/F9 popups.

Also, TakeShapshot / PutSnapshot / DeleteSnapshot relatively nicely fit into the existing API:

// Take a snapshot of the current viewport
HANDLE WINAPI CreateConsoleScreenBuffer(
  DWORD               DesiredAccess,     // Ignore
  DWORD               ShareMode,         // Ignore
  const SECURITY_ATTRIBUTES *Attributes, // Ignore
  DWORD               Flags,             // CONSOLE_SNAPSHOT_BUFFER
  LPVOID              ScreenBufferData   // Ignore
);
// Write the taken snapshot into the current viewport
BOOL WINAPI SetConsoleActiveScreenBuffer(
  HANDLE hConsoleOutput // A handle to a CONSOLE_SNAPSHOT_BUFFER
);
// Discard the allocated snapshot
BOOL CloseHandle(
  HANDLE hObject // A handle to a CONSOLE_SNAPSHOT_BUFFER
);
  • The external impact is one new SDK constant for the new buffer type.

Dude. I love this. This sounds completely doable. Adding one constant to an existing flag field and not changing the method signatures at all is the level of risk I'll happily take with the existing API surface. We would just also need to figure out how to represent this as VT or make it analogous to VT such that we could co-exist with other Terminals and the Linux world and I think we have a spec.

j4james commented 3 years ago

We want to backup a region of the screen, draw over the top of it, and then dismiss it and have what was behind it get restored. I wonder if there's anything in https://invisible-island.net/xterm/ctlseqs/ctlseqs.html that would help us here if we implemented it.

FYI, there is actually a way to read the screen to some extent using VT sequences - it's just very inconvenient, and for those terminals that do support it, it's often disabled because it's considered a security risk.

Essentially what you do is use the DECRQCRA command to request a checksum of the screen, but limit it to one cell at a time. I haven't really played with this much, so I'm not sure to what extent it works with Unicode, but in theory you should be able to determine the underlying character from the returned checksum.

As for reading attributes, XTerm added the XTCHECKSUM sequence to configure the form of checksum that DECRQCRA produces, and I think that enables it to include the cell attributes in the checksum, which in theory would let you to determine the attributes of a given cell (again I haven't actually tested this).

Obviously you can see this wouldn't be a very efficient way to read the entire screen, so I'm not sure it's a practical solution. But I thought it was worth mentioning, because if DECRQCRA is considered a security risk, then assumedly any other sequence we invent to read the screen would be just as risky.

miniksa commented 3 years ago

Yeah @j4james... I agree both on the efficiency and security points. That's why I'm excited to understand the scenario as a sort of "keep a backup for me, let me draw on top as an overlay, then restore it." I think the better route to follow is to allow for an alternate screen buffer that copies the main one on entry so it can be destructively painted then restored by returning to the main one.

zadjii-msft commented 2 years ago

tl;dr for folks coming to the thread late:

We discussed this a bit more yesterday. The "create a screen buffer which is a copy of the current one" seemed like a good consensus. Whether that's with CreateConsoleScreenBuffer(..., CONSOLE_SNAPSHOT_BUFFER, ...)/SetConsoleActiveScreenBuffer or some custom VT sequence for "enter the alternate buffer (and initialize the contents of the alt buffer to the contents of the main buffer)" seemed like a minor semantic difference. Maybe something like \x1b[?9049h/l, since we've already used ?9001h/l in the past. I honestly don't know which would be easier to prototype, I know I'd probably start with the VT version, if that's something that would work for this.

If that's something that might work for Far, we can try and prototype something here.

alabuzhev commented 2 years ago

@zadjii-msft Thanks for getting back to this.

To summarise, this is what we do now:

                                     β”Œβ”€β”€β”€β”€β”€β”
                                     β”‚Startβ”‚
                                     β””β”€β”€β”¬β”€β”€β”˜
   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Ίβ”€
   β”‚                                    β–Ό
   β”‚                          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚                          β”‚  Take a snapshot  β”‚
   β”‚                          β”‚  of the viewport  β”‚
   β”‚                          β”‚(ReadConsoleOutput)β”‚
   β”‚                          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
   β”‚                                    β–Ό
   β”‚                              β”Œβ”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”
   β”‚                              β”‚Draw the UIβ”‚
   β”‚                              β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
   β”‚                                    β”œβ—„β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚                                    β–Ό                                  β”‚
   β”‚                          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                        β”‚
   β”‚               β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€Wait for user inputβ”œβ”€β”€β”€β”€β”€β”€β”€β”€β”               β”‚
   β”‚               β”‚          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β”‚               β”‚
   β”‚               β–Ό                                       β–Ό               β”‚
   β”‚    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
   β”‚    β”‚User enters a commandβ”‚            β”‚User wants to see (a part of)β”‚ β”‚
   β”‚    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜            β”‚   the output under the UI   β”‚ β”‚
   β”‚               β”‚                       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
   β”‚               β–Ό                                       β–Ό               β”‚
   β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
   β”‚ β”‚ Write the saved snapshot β”‚            β”‚Rearrange the UI layers β”‚    β”‚
   β”‚ β”‚      to the console      β”‚            β”‚   internally so that   β”‚    β”‚
   β”‚ β”‚    Execute the process   β”‚            β”‚     the snapshot is    β”‚    β”‚
   β”‚ β”‚The process writes output β”‚            β”‚   (partially) visible  β”‚    β”‚
   β”‚ β”‚ to the console and exits β”‚            β”‚ Write the merged image β”‚    β”‚
   β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜            β”‚    to the console      β”‚    β”‚
   β”‚               β”‚                         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
   β”‚               β”‚                                    β”‚                  β”‚
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

This worked nice for decades, but now those CHAR_INFOs can't represent all the information on the screen (because of new colours and text styles) and also there are issues with surrogates like the described above, so reading and writing them back is no longer lossless.

With opaque "snapshots":

                                     β”Œβ”€β”€β”€β”€β”€β”
                                     β”‚Startβ”‚
                                     β””β”€β”€β”¬β”€β”€β”˜
   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Ίβ”€
   β”‚                                    β–Ό
   β”‚                          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚                          β”‚  Take a snapshot  β”‚
   β”‚                          β”‚  of the viewport  β”‚
   β”‚                          β”‚(opaque, VT or API)β”‚
   β”‚                          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
   β”‚                                    β–Ό
   β”‚                              β”Œβ”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”
   β”‚                              β”‚Draw the UIβ”‚
   β”‚                              β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
   β”‚                                    β”œβ—„β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚                                    β–Ό                                  β”‚
   β”‚                          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                        β”‚
   β”‚               β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€Wait for user inputβ”œβ”€β”€β”€β”€β”€β”€β”€β”€β”               β”‚
   β”‚               β”‚          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β”‚               β”‚
   β”‚               β–Ό                                       β–Ό               β”‚
   β”‚    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
   β”‚    β”‚User enters a commandβ”‚            β”‚User wants to see (a part of)β”‚ β”‚
   β”‚    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜            β”‚   the output under the UI   β”‚ β”‚
   β”‚               β”‚                       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
   β”‚               β–Ό                                       β–Ό               β”‚
   β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
   β”‚ β”‚Restore the saved snapshotβ”‚          β”‚ Restore the saved snapshot β”‚  β”‚
   β”‚ β”‚    (opaque, VT or API)   β”‚          β”‚     (opaque, VT or API)    β”‚  β”‚
   β”‚ β”‚    Execute the process   β”‚          β”‚Write UI layers on top of itβ”‚  β”‚
   β”‚ β”‚The process writes output β”‚          β”‚   but only where needed    β”‚  β”‚
   β”‚ β”‚ to the console and exits β”‚          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
   β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
   β”‚               β”‚
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Essentially it's almost the same workflow, except for the bottom right corner, describing the case when we want to display both the output and the UI simultaneously (see the screenshot above). It is a little more complicated than the other branch because currently we don't write anything to the console directly, but into a CHAR_INFO-ish memory buffer instead and then flush it when needed, not caring what is inside. With the new approach we will need to do some extra work to preserve the (parts of) the snapshot we've just restored and write only precisely where needed, but that's not rocket science and I'm happy to try it.

j4james commented 2 years ago

I was just doing some research on the rectangular editing operations (DECCRA, DECFRA, DECERA, etc.), and it occurred to me that these functions may provide a solution to the layering problem when combined with the paging APIs discussed in #13892.

The key point to note is that the Copy Rectangular Area operation (DECCRA) takes a source and destination page number. So if you want to save an area of the screen, and restore it later, you should just be able to copy it to a background page, and then copy it back again when you want it restored.

I need to do some testing on terminals that actually implement these operations to see if they work the way I think they do, but they sound promising.

j4james commented 2 years ago

I've had a chance to test this now on a couple of different terminal emulators, and most work really well. MLTerm had issues restoring the "blank" parts of the screen, if you're tried to use an overlay immediately after startup, but once the first page had scrolled out of view it seemed to work fine.

Here's a short demo from one of the terminals I tested.

https://user-images.githubusercontent.com/4181424/189869500-075cd840-9c4e-43b0-ae46-2ae3c118913f.mp4

zadjii-msft commented 2 years ago

Wow that's a GREAT solution. That sounds like the perfect way to save off a section of the buffer and restore it. I suppose some subprocess could hijack the page that you stashed buffer contents into, but meh? That seems like something that's worth at least exploring more. Seems like a way easier way to try to do overlays in a shell.