Closed peter-bertok closed 1 year ago
This one is fascinating. I spy two bugs here. One is with the emoji input (this could just be PSReadline's fault), and the other is that some emoji are still too tiny!
dumping any of the two files in the terminal doesn't render any data properly. these two files ocntails all the emojis that windows provide.
CMD:
use more
or type
to dump content into the terminal.
shows garbage text, but doesn't crash
PowerShell
use get-content
to dump the data into the terminal
nothing shows , hangs the terminal app and crashes
WSL
use cat
to dump the content into the terminal
nothing shows, hangs the terminal app and crashes
N.B. --> Crash doesn't terminate terminal app
Pasting emoji input is also an issue with cmd
. In the screenshot below, I paste a string of smileys in, but they come out as invalid glyphs.
Hitting enter
and up
displays the correct input string though, so it is making it into the console buffer correctly.
For me later:
OutputCellView OutputCellIterator::s_GenerateView(const std::wstring_view view,
const TextAttribute attr,
const TextAttributeBehavior behavior)
{
const auto glyph = Utf16Parser::ParseNext(view);
DbcsAttribute dbcsAttr;
if (IsGlyphFullWidth(glyph))
{
dbcsAttr.SetLeading();
}
return OutputCellView(glyph, dbcsAttr, attr, behavior);
}
As the two wchar_t
s get written to the buffer by WriteCharsLegacy
, we create an OutputCellIterator
to write each half of the emoji. Unfortunately, we write each half one char at a time. Utf16Parser::ParseNext
doesn't like that. It knows the first wchar_t
is a leading byte, but can also tell there's no trailing byte, so it just returns a Replacement char.
The character does end up getting inserted into the cooked read data correctly, which is why hitting enter to submit the commandline in cmd
works just fine. The data in the cooked read data is correct, but the text buffer has the wrong data.
Presumably, the cooked read is just writing the text buffer wrong. COOKED_READ_DATA::ProcessInput
can only handle one wchar_t
at a time.
When you use the emoji picker to input the character, it first comes through ConversionAreaInfo::WriteText
straight to _screenBuffer->Write
to draw the composition buffer. Then, once the dialog is dismissed, the keys get sent to the input buffer in ConsoleImeInfo::_InsertConvertedString
, where again the cooked read gets them one char at a time to display broken in the buffer.
EDIT: March 30th 2020
I've investigated into this a bit, and this is one of those terrible rabbit-hole issues. Even if we do add support for simply typing/pasting emoji to COOKED_READ
, that opens up a whole other can of bugs. Then, COOKED_READ
should probably also be enlightened to support backspacing an emoji. Also, what happens for applications that are expecting UCS-2 input, not utf-16? It's an unfortunately complex issue that we'll have to resolve on the console side of things.
This is now the "COOKED_READ (cmd.exe) doesn't properly support emoji input" issue, and I'm moving this to 21H1 as a "Feature", so we can try and prioritize for the next Windows release.
Environment
Steps to reproduce
Paste text containing complex Unicode characters such as emoji into a PowerShell tab as a string literal. Emoji will be displayed as "??" placeholders, but then display correctly when the literal is "output" by pressing enter.
Expected behavior
Unicode characters such as Emoji should be consistently displayed, including in string literals, input text, command-line arguments, etc...
Actual behavior
Inconsistent display: