mattn / go-runewidth

wcwidth for golang
MIT License
608 stars 94 forks source link

Define width? #43

Open ghostsquad opened 4 years ago

ghostsquad commented 4 years ago

This is a question about how you are defining "width"? I'm mostly looking for a solution that gives me character width in monospaced fonts. So example in #39 and #36, the "width" would still be 2 as a flag although is considered 1 character in modern renders, it still takes up the space of 2 normal characters.

dolmen commented 4 years ago

@ghostsquad rune has a clear definition in the Go specification: an integer value identifying a Unicode code point.

The doc for RuneWidth gives another hint: it points to https://www.unicode.org/reports/tr11/ which talks about cells.

Instead flag emojis are made of 2 runes/codepoints.

So this package is more about East Asian characters, not emojis.

dolmen commented 4 years ago

@ghostsquad uniseg.GraphemeClusterCount might interest you: it will tell you how multiple runes combine for a single grapheme. But that's not a complete solution to you problem (I suppose rendering in a terminal emulator): it will not tell you how much space is used to render that grapheme in a monospace font (especially as "monospace font" and "modern renders" are fuzzy).

mattn commented 4 years ago

@dolmen There is already plan to use it.

See https://github.com/mattn/go-runewidth/pull/29

ghostsquad commented 4 years ago

@dolmen yep I already looked at uniseg, and it doesn't provide the right information

ghostsquad commented 4 years ago

You can kinda see some of the problems I'm trying to solve... it seems not even all monospaced fonts are made equally. From the github code view, you can see that the right padding misaligns the text. But from the screenshot (of my terminal, using Fira Mono for Powerline), the right padding is needed.

❯ ./test
     rune width: 2
     rune count: 1
            len: 4
    grapheme ct: 1
   req left pad: 3
  req right pad: 0
[  🔄 AAA]
     rune width: 2
     rune count: 2
            len: 8
    grapheme ct: 1
   req left pad: 4
  req right pad: 1
[  🇧🇾  BBB]
     rune width: 2
     rune count: 2
            len: 6
    grapheme ct: 1
   req left pad: 4
  req right pad: 1
[  ℹī¸  CCC]
     rune width: 1
     rune count: 1
            len: 3
    grapheme ct: 1
   req left pad: 4
  req right pad: 0
[   â€ĸ DDD]

[  🔄 AAA]
[  🇧🇾  BBB]
[  ℹī¸  CCC]
[   â€ĸ DDD]
image
package main

import (
    "fmt"
    "unicode/utf8"

    "github.com/mattn/go-runewidth"
    "github.com/rivo/uniseg"
)

func main() {
    fmt.Printf("%15s: %d\n", "rune width", runewidth.StringWidth("🔄"))
    fmt.Printf("%15s: %d\n", "rune count", utf8.RuneCountInString("🔄"))
    fmt.Printf("%15s: %d\n", "len", len("🔄"))
    fmt.Printf("%15s: %d\n", "grapheme ct", uniseg.GraphemeClusterCount("🔄"))
    fmt.Printf("%15s: %d\n", "req left pad", 3)
    fmt.Printf("%15s: %d\n", "req right pad", 0)
    fmt.Printf("[%*s", 3, "🔄")
    fmt.Printf(" AAA]\n")

    fmt.Printf("%15s: %d\n", "rune width", runewidth.StringWidth("🇧🇾"))
    fmt.Printf("%15s: %d\n", "rune count", utf8.RuneCountInString("🇧🇾"))
    fmt.Printf("%15s: %d\n", "len", len("🇧🇾"))
    fmt.Printf("%15s: %d\n", "grapheme ct", uniseg.GraphemeClusterCount("🇧🇾"))
    fmt.Printf("%15s: %d\n", "req left pad", 4)
    fmt.Printf("%15s: %d\n", "req right pad", 1)
    fmt.Printf("[%*s", 4, "🇧🇾")
    fmt.Printf("  BBB]\n")

    fmt.Printf("%15s: %d\n", "rune width", runewidth.StringWidth("ℹī¸"))
    fmt.Printf("%15s: %d\n", "rune count", utf8.RuneCountInString("ℹī¸"))
    fmt.Printf("%15s: %d\n", "len", len("ℹī¸"))
    fmt.Printf("%15s: %d\n", "grapheme ct", uniseg.GraphemeClusterCount("ℹī¸"))
    fmt.Printf("%15s: %d\n", "req left pad", 4)
    fmt.Printf("%15s: %d\n", "req right pad", 1)
    fmt.Printf("[%*s", 4, "ℹī¸")
    fmt.Printf("  CCC]\n")

    fmt.Printf("%15s: %d\n", "rune width", runewidth.StringWidth("â€ĸ"))
    fmt.Printf("%15s: %d\n", "rune count", utf8.RuneCountInString("â€ĸ"))
    fmt.Printf("%15s: %d\n", "len", len("â€ĸ"))
    fmt.Printf("%15s: %d\n", "grapheme ct", uniseg.GraphemeClusterCount("â€ĸ"))
    fmt.Printf("%15s: %d\n", "req left pad", 4)
    fmt.Printf("%15s: %d\n", "req right pad", 0)
    fmt.Printf("[%*s", 4, "â€ĸ")
    fmt.Printf(" DDD]\n")

    fmt.Println()

    fmt.Printf("[%*s AAA]\n", 3, "🔄")
    fmt.Printf("[%*s  BBB]\n", 4, "🇧🇾")
    fmt.Printf("[%*s  CCC]\n", 4, "ℹī¸")
    fmt.Printf("[%*s DDD]\n", 4, "â€ĸ")
}
ghostsquad commented 4 years ago

well, I might have landed on something interesting...

package main

import (
    "fmt"
    "strings"
    // "unicode/utf8"

    "github.com/mattn/go-runewidth"
)

// aligns to 5 characters
func valuePaddingPredictor(val string) string {
    runeWidth := runewidth.StringWidth(val)
    // runeCount := utf8.RuneCountInString(val)
    stringLen := len(val)

    leftPad := 3
    rightPad := 1
    if runeWidth == 1 {
        leftPad++
    }

    if stringLen > 4 {
        leftPad++
        rightPad++
    }

    return fmt.Sprintf("[%*s%sAAA]", leftPad, val, strings.Repeat(" ", rightPad))
}

func main() {
    characters := []string{
        "🔄",
        "🇧🇾",
        "ℹī¸",
        "💩",
        "x",
        "😀",
        "💚",
        "☁ī¸",
        "â€ĸ",
        "⨯",
        "✔ī¸",
        "✓",
        "؏",
        "├",
        "âģ¨",
    }

    for _, c := range characters {
        fmt.Println(valuePaddingPredictor(c))
    }
}
[  🔄 AAA]
[  🇧🇾  AAA]
[  ℹī¸  AAA]
[  💩 AAA]
[   x AAA]
[  😀 AAA]
[  💚 AAA]
[  ☁ī¸  AAA]
[   â€ĸ AAA]
[   ⨯ AAA]
[  ✔ī¸  AAA]
[   ✓ AAA]
[   ؏ AAA]
[   ├ AAA]
[  âģ¨ AAA]
image

this is probably good enough for what I need.

jquast commented 10 months ago

Hello,

I maintain the python wcwidth library, and I recently wrote a specification that is of interest to this specific issue. I also wrote an automatic testing tool to asses any individual terminal emulator's compliance to the specification for Wide, Zero, ZWJ, and Emoji VS-16 character sequences.

I wrote an overview here https://www.jeffquast.com/post/ucs-detect-test-results/

I just want to point out, most especially, the automatic test results for 20+ terminals, that indeed you will find varying levels of unicode version and feature support across terminals, so it is important to keep that in mind when trying to validate.