ruby / error_highlight

The gem enhances Exception#message by adding a short explanation where the exception is raised
MIT License
149 stars 23 forks source link

Unicode characters #4

Open mame opened 3 years ago

mame commented 3 years ago

Currently, error_highlight does not handle Unicode characters well. There are two subissues.

  1. Ruby::AST::Node#first_column and #last_column seem to return the column in bytes, but String#match handles the index in characters. We need to convert the column indexes.
  2. Some Unicode characters are displayed as two (or more?) columns in a terminal with monospace font.

(1) is relatively simple, but (2) is a bit tough. It requires a table telling how many columns each character has. It is known that Reline has such a table. But because error_highlight is a built-in gem that is loaded at Ruby process invocation, it is not good for error_highlight to depend on Reline (unless we make Reline a special built-in gem). We need to discuss how we make the table available to error_highlight.

kddnewton commented 2 years ago

Hey @mame!

I hit this same thing with ripper when I was writing prettier. I ended up solving it by taking the source, splitting it up into multiples lines, and converting each into an object that responded to #[] so that I could get the right indices.

Here are some links to the source:

I hope it's helpful!

mame commented 2 years ago

Thanks for the information. I think it is about the issue (1) that I said. Yeah, it is solvable by converting the indices.

The tougher issue is (2). Unfortunately, some Unicode characters (mainly Chinese, Japanese, and Korean characters) are rendered as if they have two columns.

image

is one Japanese letter that takes two columns in the terminal. To highlight the letter, we need to put two ^s under the line. To implement this, error_highlight needs a table to tell what character takes two (or more) columns.

Just FYI: To make matters worse, the column count may change depending on a font and a terminal. This issue is called East Asian Width:

Ambiguous width characters are all those characters that can occur as fullwidth characters in any of a number of East Asian legacy character encodings. They have a “resolved” width of either narrow or wide depending on the context of their use.

To be honest, I don't want to face this problem for now 😇

kddnewton commented 2 years ago

@mame I see, I think I understand the problem better now. In that case it would probably be nice to have Ruby::AST::Node have methods like {first,last}_character_column or something similar.