jtdaugherty / vty

A high-level ncurses alternative written in Haskell
BSD 3-Clause "New" or "Revised" License
321 stars 57 forks source link

Incorrect Width Calculation for Characters with Variation Selector-16 #274

Open skiars opened 2 months ago

skiars commented 2 months ago

Description

I've noticed that the wctwidth / wcswidth function in the vty library seems to miscalculate the width of certain text, particularly those which involve the Variation Selector-16 (U+FE0F). For example, the emoji 🏞️ (National Park) is composed of U+1F3DE followed by U+FE0F. This should be rendered as a colorful double-width emoji. However, the current width calculation doesn't seem to reflect this correctly.

Steps to Reproduce

  1. Use the wctwidth function to calculate the width of the character 🏞️ (which is U+1F3DE followed by U+FE0F).
  2. Observe that the calculated width does not match the expected width (normally, it should be 2).

Example Code

ghci> import Graphics.Text.Width
ghci> wcswidth "🏞️"  -- This should ideally return 2, but it doesn't

The wcswidth function should return 2 for the character 🏞️ as it should be considered a double-width emoji. But the wcwidth function currently returns 1, which does not account for the Variation Selector-16 and results in incorrect rendering where the cursor position becomes misaligned in terminals.

Environment

Additional Context

Variation Selector-16 (U+FE0F) is used to indicate that the preceding character should be displayed as an emoji. Proper support for this selector is crucial for accurate width calculation of such Unicode sequences.

For reference, this issue has been observed with Windows Terminal too, and here is some relevant information:

Would be great to discuss potential fixes or workarounds for this issue. Thank you for your attention and support!

jtdaugherty commented 2 months ago

Thanks for filing this!

This version of the problem is due to the fact that Vty does no lookahead when computing character width. There are other Unicode features that would also need to be considered to do proper lookahead as far as I am aware, such as zero-width joiners. I haven't investigated what it would take to do this properly, largely because I really don't want to re-implement various bits of the Unicode spec in vty. So if you know of a Haskell implementation that deals with this in an efficient way, I would love to know about it!

The problem is deeper even than what is reported here, unfortunately. For posterity, there are some other older tickets that capture some of the issues:

Essentially, Vty could have a perfectly correct implementation of width calculation and then still disagree with some terminal emulators on the width of some Unicode sequences, depending on the implementation in those terminal emulators. We've run into this when trying to "fix" Vty in this way, only to have to back out the changes because Vty then came into greater disagreement with some terminal emulators on character widths, resulting in broken rendering and cursor placement. @glguy helped us develop a partial solution to this problem by interrogating the terminal to ask it how wide Unicode characters are, but that only worked for single-character tests that don't require lookahead, so in practice it doesn't work well enough to be a fully general solution.

Given those issues, I don't know what the best path forward is. At a minimum, it would be nice to have access to an implementation of width calculations that we don't need to maintain, leaving that to people who know the Unicode spec and its various versions much better than I do. If we had that, that would at least give us a starting point for knowing what good needs to look like. If we had that, then we could at least see how well that works in practice with terminal emulators whose Unicode implementations might also be stale and/or incorrect when it comes to character widths.

jtdaugherty commented 2 months ago

(And, utf8proc was an attempt at exactly this: relying on what seemed to be a well-maintained library for dealing with some of these issues. Maybe that's still a good way to go, ultimately, but I don't recall whether that library would have helped with lookahead-related issues.)