The text layout code is expecting the quantity of glyphs is equal to the cells allocated for each line

microsoft / terminal

The new Windows Terminal and the original Windows console host, all in the same place!

MIT License

95.41k stars 8.3k forks source link

The text layout code is expecting the quantity of glyphs is equal to the cells allocated for each line #3546

Closed reli-msft closed 3 months ago

reli-msft commented 4 years ago

The issue is found in CustomTextLayout:

Instead of carefully process the clusters input, CustomTextLayout drops nearly all the information from clusters, and only keeping how many columns a cluster occupy, and later it is expecting that the glyph quantity equals to the quantity of clusters. This may lead to incorrect rendering result for many cases (complex script, ligatures, etc.)

Bug #610, #696 may be caused by this.

reli-msft commented 4 years ago

Running the following script in Node.JS shows a lot of problems in current text layout code:

console.log(`
Combining Diacritics    | [a\u0300\u0304]
Polytonic Greek         | [ᾊ] [Α\u0313\u0300\u0345]
Ideographic Variation   | [東京都葛\u{E0101}飾区] [奈良県葛\u{E0100}城市]
Composite Hangul        | [\u1100\u116A\u11B3]
Combining Dakuten       | [か\u309A]
Emoji vs Text           | [\u2602] [\u2602\uFE0E] [\u2602\uFE0F]
Emoji                   | [\u{1F469}]
Emoji with Skin Tone    | [\u{1F469}\u{1F3FB}]
Emoji Variation and ZWJ | [\u{1F469}\u{1F3FB}\u200D\u{1F4BB}]
`)

Terminal results:

Chrome Dev Tools results:

Edge F12 results:

Word results:

skyline75489 commented 4 years ago

Possibly also #2191

skyline75489 commented 4 years ago

Let me add Firefox result here, so that people don't forget that there's a browser out there called "Firefox":

DHowett-MSFT commented 4 years ago

@skyline75489 my hero

skyline75489 commented 4 years ago

@DHowett-MSFT my man. I'm seeing regression on current master https://github.com/microsoft/terminal/commit/306e75163929161b130872dcae890686da57790d

skyline75489 commented 4 years ago

I've used git bisect and found that ddcc06e911a18250c8544c7ea83011451e8256a2 is the first bad commit. I've got no idea about what #3474 even is..

skyline75489 commented 4 years ago

I've tried this on my laptop and sadly the regression is the same:

Emoji is noticeably broken. Japanese characters are different, too.

zadjii-msft commented 4 years ago

@skyline75489 Thanks for the report! I'm going to move that regression to it's own thread (#3553), considering you've narrowed the regression down to a particular commit. Broadly our text rendering needs help with M chars->N glyphs scenarios like this issue describes, but we probably shouldn't have things getting worse on us.

reli-msft commented 4 years ago

This is heavily related to this issue: https://github.com/microsoft/terminal/issues/780 There's WriteCharsLegacy and Terminal::_WriteBuffer, and they should be unified into one function that writes char stream to the back buffer with proper itemization.

skyline75489 commented 4 years ago

Without #780 we can still improve the current implementation like #3578 . I'm not sure whether that's worth the effort, though. It's more like scratching the surface.

reli-msft commented 4 years ago

@skyline75489 We need to change the way we write buffers to perform correct line wrap and cursor movement. For WT there are two buffers being used: one for CONHOST, which is the "Ground Truth", and one for WT, and the write subroutine is written differently.

So what's wrong with WriteCharsLegacy? Well...

It does not process surrogate pairs properly.
Combining characters are not written into the "previous" cell. They are treated separately and the cursor is moved.

And to provide a proper text layout, cells should...

Have some metadata to guide the client that whether they should be shaped together with adjacent cells.
If the real width of glyphs is different from the desired width, whar should we do.

skyline75489 commented 4 years ago

@reli-msft Gotta admit, the whole Unicode experience is harder to fix than I initially anticipated. I'm not sure the exact plan Console team have. But I'm positive that we are heading to the right direction.

reli-msft commented 4 years ago

@skyline75489 Everybody are underestimating how complex text could be.

Let me count all the features that a proper rich text box should support:

Input methods, including...
- Composing IME
- Speech
- Handwriting
- Emoji picker
Complex text rendering, including...
- Bidi
- Shaping
- Complex-script-aware editing
- Complex-script-aware find, replace, etc.
Accessibility

And what they can choose to support:

Advanced typography, including...
- About line breaking...
  - Hyphenation
  - Optimized line break (Knuth-plass, etc.)
- About microtypography
  - OpenType features (ligatures, etc.)
  - OpenType variations
  - Multiple master fonts
  - Proper font fallback ← It is not a simple lookup-at-each-character process
- Advanced Middle East features, including...
  - Kashida
- Advanced Far East features, including...
  - Kinsoku Shori
  - Auto space insertion
  - Kumimoji
  - Warichu
  - Ruby
- Proper vertical layout, including...
  - Yoko-in-Tate
Inline objects and paragraph-like objects, including...
- Images
- Hyperlinks
- Math equations ← This is really hard.

skyline75489 commented 4 years ago

@reli-msft I'd trust you with the whole Unicode thingy myself. I'm a real rookie on this. You can tell from my PR ... Anyway let @DHowett-MSFT take us to where we should eventually be.

DHowett-MSFT commented 4 years ago