Closed rivo closed 3 years ago
Here's my own implementation of the "string width" function which takes grapheme clusters into account:
https://github.com/rivo/tview/blob/8d5eba0c2f51d8ae971c5a470e354bbc2aae6777/util.go#L419
It's based on the assumption that the width of a grapheme cluster is the width of the first non-zero-width rune. That's just my guess but it works fine for a bunch of examples I tried manually.
Maybe you want to use this implementation in your package. I think it would definitively improve the calculation of a string width. You could then also get rid of the special zero-width-joiner handling as it's all implicit in the uniseg
package.
Could you please send me PR?
Hi! I'm not sure if this issues related but assume they are.
characters {"←", "↖", "↑", "↗", "→", "↘", "↓", "↙"}
accepted by my terminal as of width 1 and all is working as it should, however runewidth.StringWidth(char)
is giving [1 2 1 2 1 2 1 2]
correspondingly and that breaks output
// Character StringWidth uniseg.Graphemes
← 1 [2190]
↖ 2 [2196]
↑ 1 [2191]
↗ 2 [2197]
→ 1 [2192]
↘ 2 [2198]
↓ 1 [2193]
↙ 2 [2199]
same for
// Character StringWidth uniseg.Graphemes
■ 1 [25a0]
□ 1 [25a1]
▪ 2 [25aa]
▫ 2 [25ab]
I hope this additional info will help.
My php package
php-wcwidth
(which is practically a dumb clone of python'sjquast/wcwidth
) gets widths of these chars correctly
Thank you. Could you please show me screenshot?
This is an screenshot taken on my environment.
this one?
same but larger
and from terminal
What is your $LANG?
LANG=en_US.UTF-8
@joshuarubin 0x2194 in emoji is correctly?
@mattn here's what I found out: these do not have an emojis
but these do:
and my terminal can print them both
however, I'm unable to figure out how to print it from my code
printing by code gives ↔
copy-pasting also gives ↔
it seems like 2194
is followed by fe0f
to print emoji
so 2194 fe0f
UPD DerivedGeneralCategory.txt:
FE00..FE0F ; Mn # [16] VARIATION SELECTOR-1..VARIATION SELECTOR-16
Here's a short example that illustrates an issue with flags (or "regional indicators"):
The flag consists of two code points which are processed separately by
runewidth
. But most modern systems will combine them into one flag emoji.This is part of a larger topic which I describe in more detail here: gdamore/tcell#264. It doesn't just affect flags but also characters in e.g. Arabic and Korean where there are more sophisticated rules than "combining characters" and zero-width joiners (which you added with #20).
I don't know exactly how you calculate the widths of characters. I'm also not sure how you would solve flags as well as some of the other rules described in the Unicode specification but it would sure be nice as printing these flags currently gives me trouble in
tview
. There have been multiple issues asking for better support for different languages and emojis so it seems that there are quite a few people who use the terminal with these characters.(Maybe my new package
uniseg
can help you here.)