nvaccess / nvda

NVDA, the free and open source Screen Reader for Microsoft Windows
https://www.nvaccess.org/
Other
2.1k stars 634 forks source link

Braille routing accuracy breaks in text with emoji #9034

Closed LeonarddeR closed 5 years ago

LeonarddeR commented 5 years ago

Steps to reproduce:

  1. Open notepad
  2. Copy: Hello😉test😉cheese
  3. Route the cursor to the c of cheese with a braille display

Actual behavior:

Cursor ends up at h in cheese

Expected behavior:

Cursor ends up at c in cheese

Explanation

When routing a braille display in a TextInfoRegion, NVDA counts the number of characters in the text of the region, creates a text info at the start of the reading unit and moves it n characters. However, in most controls, including NVDA's default offsets implementation #8953), eemoji are treated as one character while they take up two characters in Python 2 unicode strings and in liblouis' output. As noted in #8953, Gecko and Chrome treat emoji as two characters as well.

Proposed fix

This would be fixed by Python 3, but Python 3 creates other issues regarding offset based text infos that also need to be addressed. Another fix would be compiling liblouis with UCS-4 support (#6695). I have a working branch that fixes this for controls like Notepad, Word, UIA etc. but it recreates the bug the other way around for Gecko and Chrome.

LeonarddeR commented 5 years ago

Fixed in #9044. We'll leave this as is in Python 2 builds for now.