Closed reli-msft closed 3 months ago
Running the following script in Node.JS shows a lot of problems in current text layout code:
console.log(`
Combining Diacritics | [a\u0300\u0304]
Polytonic Greek | [ᾊ] [Α\u0313\u0300\u0345]
Ideographic Variation | [東京都葛\u{E0101}飾区] [奈良県葛\u{E0100}城市]
Composite Hangul | [\u1100\u116A\u11B3]
Combining Dakuten | [か\u309A]
Emoji vs Text | [\u2602] [\u2602\uFE0E] [\u2602\uFE0F]
Emoji | [\u{1F469}]
Emoji with Skin Tone | [\u{1F469}\u{1F3FB}]
Emoji Variation and ZWJ | [\u{1F469}\u{1F3FB}\u200D\u{1F4BB}]
`)
Terminal results:
Chrome Dev Tools results:
Edge F12 results:
Word results:
Possibly also #2191
Let me add Firefox result here, so that people don't forget that there's a browser out there called "Firefox":
@skyline75489 my hero
@DHowett-MSFT my man. I'm seeing regression on current master https://github.com/microsoft/terminal/commit/306e75163929161b130872dcae890686da57790d
I've used git bisect
and found that ddcc06e911a18250c8544c7ea83011451e8256a2 is the first bad commit. I've got no idea about what #3474 even is..
I've tried this on my laptop and sadly the regression is the same:
Emoji is noticeably broken. Japanese characters are different, too.
@skyline75489 Thanks for the report! I'm going to move that regression to it's own thread (#3553), considering you've narrowed the regression down to a particular commit. Broadly our text rendering needs help with M chars->N glyphs scenarios like this issue describes, but we probably shouldn't have things getting worse on us.
This is heavily related to this issue: https://github.com/microsoft/terminal/issues/780
There's WriteCharsLegacy
and Terminal::_WriteBuffer
, and they should be unified into one function that writes char stream to the back buffer with proper itemization.
Without #780 we can still improve the current implementation like #3578 . I'm not sure whether that's worth the effort, though. It's more like scratching the surface.
@skyline75489
We need to change the way we write buffers to perform correct line wrap and cursor movement.
For WT there are two buffers being used: one for CONHOST
, which is the "Ground Truth", and one for WT, and the write subroutine is written differently.
So what's wrong with WriteCharsLegacy
? Well...
And to provide a proper text layout, cells should...
@reli-msft Gotta admit, the whole Unicode experience is harder to fix than I initially anticipated. I'm not sure the exact plan Console team have. But I'm positive that we are heading to the right direction.
@skyline75489 Everybody are underestimating how complex text could be.
Let me count all the features that a proper rich text box should support:
And what they can choose to support:
@reli-msft I'd trust you with the whole Unicode thingy myself. I'm a real rookie on this. You can tell from my PR ... Anyway let @DHowett-MSFT take us to where we should eventually be.
Latest results from master
@DHowett-MSFT Looks better... Sort of. Still needs to update our knowledge about Hangul Jamos' width
You know, with the completion of #16916 I believe we've finally cracked this nut.
The issue is found in
CustomTextLayout
:Instead of carefully process the
clusters
input,CustomTextLayout
drops nearly all the information fromclusters
, and only keeping how many columns a cluster occupy, and later it is expecting that the glyph quantity equals to the quantity of clusters. This may lead to incorrect rendering result for many cases (complex script, ligatures, etc.)Bug #610, #696 may be caused by this.