sublimehq / sublime_text

Issue tracker for Sublime Text
https://www.sublimetext.com
812 stars 39 forks source link

Visual and navigating issues with combining characters #2986

Open ahwayakchih opened 5 years ago

ahwayakchih commented 5 years ago

Description

When using combining character (for example: U+1AB2) with some regular characters (for example: "@"), combining character may be rendered separately and text caret seems to be rendered 1 character earlier than it actually is located at (for every such combining character, so it can be "shifted" to the left, when using RTL text direction, by 2 or more characters).

I did not test all the combining characters, but it happened with combining-infinity (U+1AB2) and combining-clockwise-ring-overlay (U+20D9) characters.

Steps to reproduce

  1. enter non-letter, non-digit character, e.g., double-quote or exclamation mark,
  2. enter another such character, e.g., "@",
  3. enter combining infinity character (U+1AB2),
  4. enter any regular character, e.g., "a",
  5. click or use shortcut to to move text caret to the end of line.

For a quick test, copy and paste this to new document:

✓: @᪲.
x: "@᪲"
✓:"a@᪲"
x: "@᪲""@᪲""@᪲""@᪲"

Expected behavior

Text should be rendered the same as if there was no non-letter, non-digit character before the "@" character.

Actual behavior

Combining character is rendered separately, i.e., not combined with previous character, and text caret located after them is shifted visually to the left (but editing text still works as if caret was at the correct place).

It does not need to be a double-quote or exclamation mark to trigger this bug. Many other "similar class" characters trigger it, e.g., another "@", dollar sign, etc...

It seems to happen only when two non-letter, non-digit characters are next to each other. If i enter regular letter or a digit between them, problem disappears.

Adding space somewhere after the problematic character seems to "clear" some calculations, and rest of the line works OK. One space is needed for every such character earlier in the line.

Environment

wbond commented 5 years ago

We use the OS text rendering libraries to render glyphs on the screen. We use information about unicode characters to form graphemes, which in turn control the movement of the cursor.

That is to say, a combining character is "attached" to the previous non-combining character to form graphemes. We pass these graphemes to the OS text rendering library to get the visual width of the grapheme to control how far the cursor (and next glyph) should be advanced.

In this case, my hunch is that the underlying Pango library we use on Linux isn't deciding to render the combining infinity with the preceding character, or the font being used doesn't contain the necessary glyphs, at least in certain circumstances.

I can tell you that on macOS 10.14, the text rendering library CoreText also does not like this combination. Here it what it looks like for me in Safari:

Screen Shot 2019-09-16 at 9 57 32 AM

In Sublime Text I see:

Screen Shot 2019-09-16 at 9 58 11 AM

You mentioned RTL in your comment, although it wasn't clear to me what you were trying to say. I can tell you that currently Sublime Text's layout engine does not handle RTL layout/rendering.

ahwayakchih commented 5 years ago

Oh, i'm sorry, i meant i'm using LTR (left-to-right) language (wasn't sure if that may have any impact, so i just added that to the rest of info).

Thanks for explanation.

It's strange that the same pair of characters work ok if not preceded by some specific characters. I guess that the rendering library thinks that maybe it should combine more that 2 characters in such cases and then fails to do that?

Looks like your font does not have the combining infinity character, so it's probably even harder to know what i meant (and failed at describing, i don't know enough English words from the "text/fonts" category :).

This is how it looks (OK) in Chromium browser here: Screenshot from 2019-09-16 22-41-43

And this is how it looks (2 OK, 2 Fail) in SublimeText here: Screenshot from 2019-09-16 22-43-23 Please notice how caret is "shifted" 4 character to the left, even though i have it set at the end of line (status text in lower left corner says "Line 4, Column 20" which is OK).

AFAIK Chrome uses HarfBuzz, so maybe that could be used by Sublime somehow? I don't know much about Pango, but on the page i found (https://pango.gnome.org/) they write that it already does use HarfBuzz somehow. Not sure if that's up-to-date info.

ahwayakchih commented 5 years ago

@wbond just a little update: i tested the same text in Gedit (https://wiki.gnome.org/Apps/Gedit) and Builder (https://wiki.gnome.org/Apps/Builder). Both render it correctly (but i'm guessing they both use the same underlying "text editor" components). Maybe it's not a problem with Linux/Gnome libraries, but only with some configuration options for setting them up? Or the data you pass to them (i mean "information about unicode characters to form graphemes" you wrote about)?

wbond commented 5 years ago

That is useful info. I will take a detailed look once I have some time. There must be some edge case or extra situation I am not thinking about.

My hunch right now may have to do with token boundaries, that is what characters are considered part of the same word. If the characters are considered part of separate words, we won't join them as a grapheme.

ahwayakchih commented 5 years ago

Thanks, :crossed_fingers: :).