mattn / go-runewidth

wcwidth for golang
MIT License
609 stars 94 forks source link

Use unicode9 character widths #11

Closed joshuarubin closed 7 years ago

joshuarubin commented 7 years ago

Update runewidth to use unicode9 character width tables. This is the default in vim and neovim now, so should be safe to use in any terminal.

The Condition is no longer calculated using IsEastAsian() as terminals do not use locale to determine how wide to draw ambiguous with characters. Rather, they simply default to 1, and may offer an option to set to 2 (which is discouraged).

All tests still pass. The API is very close to the way it was, but not identical due to the change in Condition.

I recognize this is a fairly large diff, so I'd be happy to work with you in any way you feel best to get this merged.

coveralls commented 7 years ago

Coverage Status

Coverage increased (+1.5%) to 93.75% when pulling 9d60074fcb7a1c672d48823ab219f632188c8cc9 on joshuarubin:master into 737072b4e32b7a5018b4a7125da8d12de90e8045 on mattn:master.

mattn commented 7 years ago

You seems to break some public APIs.

joshuarubin commented 7 years ago

You seems to break some public APIs

Yes, I said as much in the first post.

mattn commented 7 years ago

go-runewidth is depeneded on many packages. See https://godoc.org/github.com/mattn/go-runewidth?importers

So I don't want to change interface.

coveralls commented 7 years ago

Coverage Status

Coverage decreased (-4.4%) to 87.805% when pulling 75cb21812da07ce46c13a7681f7cd37604a76304 on joshuarubin:master into 737072b4e32b7a5018b4a7125da8d12de90e8045 on mattn:master.

coveralls commented 7 years ago

Coverage Status

Coverage decreased (-3.4%) to 88.82% when pulling 75cb21812da07ce46c13a7681f7cd37604a76304 on joshuarubin:master into 737072b4e32b7a5018b4a7125da8d12de90e8045 on mattn:master.

coveralls commented 7 years ago

Coverage Status

Coverage decreased (-3.4%) to 88.82% when pulling 75cb21812da07ce46c13a7681f7cd37604a76304 on joshuarubin:master into 737072b4e32b7a5018b4a7125da8d12de90e8045 on mattn:master.

joshuarubin commented 7 years ago

OK, I backed out the changes that modified the API. The only changes now are to use the unicode9 widths.

coveralls commented 7 years ago

Coverage Status

Coverage decreased (-0.05%) to 92.157% when pulling afa37cd08cb2a2cdecf274db34958a20a1f00607 on joshuarubin:master into 737072b4e32b7a5018b4a7125da8d12de90e8045 on mattn:master.

mattn commented 7 years ago

SGTM. I'll check this with some projects. please give me a time while.

mattn commented 7 years ago

I tried some projects which using go-runewidth. And seems good to me. BTW, what advantage for you? I'm thinking, to make sure, this change require a terminal which supported unicode9.

joshuarubin commented 7 years ago

The advantage is that wide characters (including emoji, but also for east asian languages) are actually listed as being wide. These tables are used in existing projects like vim who do not require a unicode9 terminal.

For terminals that don’t have unicode9 widths, what happens today is that emoji, for example, are considered, width 1, but drawn (due to font implementations) as width 2. As a result, characters following wide characters are often hidden, or cause the previous character to be hidden, or are simply overlaid. This is obviously wrong.

This change, for terminals that don’t have unicode9 widths, will cause a discrepancy between the terminal's width choice and go-runewidth. However, things that use go-runewidth to determine spacing (projects like tcell) will position characters appropriately, effectively overriding their terminal's (wrong) width tables and fixing rendering issues.

For terminals that have unicode9 widths, this pull request just brings the widths up-to-date so that go-runewidth and the terminal agree.

There should be no problem using these tables in any terminal regardless of their support for unicode9 character widths.

mattn commented 7 years ago

Current statuses for OS/Terminal support. For my information.

gnome-terminal (glib)

https://mail.gnome.org/archives/commits-list/2016-September/msg07491.html

iterm2

https://gitlab.com/gnachman/iterm2/wikis/unicodeversionswitching

xterm

http://invisible-island.net/xterm/xterm.log.html#xterm_325

joshuarubin commented 7 years ago

Any update on merging this?

mattn commented 7 years ago

Sorry my delay. I'm okay to merge. But I worry about someone may get wrong.

dmitshur commented 7 years ago

Updating to latest unicode standard seems like a good thing to do.

FWIW, I tested out the old version vs the new one a few characters, and I can see there's a change. It seems more consistent than before (although still not quite what I was expecting, but that's probably my terminal's fault).

Before:

image

After:

image

joshuarubin commented 7 years ago

This seems to have stalled again. Anything I can do to help?

mattn commented 7 years ago

@shurcooL @joshuarubin This PR will be great in CJK. But as far as I can see your image, terminal doesn't handle double width character correctly. If you are okay to merge, I'll do.

joshuarubin commented 7 years ago

yeah, I'd like to know which unicode characters those are, but my guess is that they are either private use characters or not considered wide (because they are not considered emoji or east asian wide). if not either of those, then it may just be that the font used doesn't follow the spec.

mattn commented 7 years ago

ping @nsf termbox-go use go-runewidth. how do you think?

dmitshur commented 7 years ago

yeah, I'd like to know which unicode characters those are

@joshuarubin, the five characters in that screenshot were x, ✅, ❌, ☑️, and ☑.

If you are okay to merge, I'll do.

@mattn, I have absolutely no objections, as long as this implements a newer Unicode specification. It makes sense to me for go-runewidth to use the latest available Unicode spec.

joshuarubin commented 7 years ago

It actually looks like all the widths are correct according to that test (the terminal is wrong though). The only tricky one is "☑️" which is U+2611 and U+FE0F.

Technically, U+2611 is considered an emoticon according to unicode9, but the convention is that characters below U+1f000 are considered single width as they predated the official emoticon definition (neovim).

So, plain U+2611 is drawn width 1 and calculated to be width 1 correctly.

However, U+2611 and U+FE0F (the latter being a variation selector) is drawn with column width 2 by the font, but calculated to be U+2611 which has width 1 and U+FE0F which is a non-spacing mark width 0.

It would be nice if this sequence of runes were calculated correctly, but at least this pull request is consistent with other projects like neovim and iTerm2.

nsf commented 7 years ago

@mattn Well, I think small hacks for weird runes are okay as long as they don't touch the others, if they cause some problems you can always revert the change, I don't really use CJK and termbox myself that much, it's all up to people who use these things.

joshuarubin commented 7 years ago

@nsf if you're referring to a hack to make U+2611 followed by U+FE0F calculated as 2 columns, I would have to disagree.

We want to be as consistent as possible with other projects. Rendering problems due to inconsistencies between column calculations of varying software is very hard to deal

For example, terminal+tmux+vim (and also the shell) all have to agree on character width or else there are going to be significant rendering issues.

As demonstrated above, these tables aren't perfect, but they are consistent with other projects which I think makes good sense to prefer.

nsf commented 7 years ago

sure, consistency is good

mattn commented 7 years ago

Thanks!

joshuarubin commented 7 years ago

Thanks for doing this!