rivo / uniseg

Unicode Text Segmentation, Word Wrapping, and String Width Calculation in Go
MIT License
585 stars 61 forks source link

Variation Selectors incorrectly modify some StringWidths #50

Closed rockorager closed 8 months ago

rockorager commented 8 months ago

When a Variation Selector succeeds a character that doesn't support it, the width should not be altered. Currently, uniseg reports a width of 2 for any grapheme which has a VS16 selector in it, regardless if the first rune is an emoji sequence or not.

Example

package main

import (
    "fmt"

    "github.com/rivo/uniseg"
)

func main() {
    // 2
    fmt.Println(uniseg.StringWidth("x\uFE0F"))
}

From the unicode standard:

image

Proposed solution

When encountering a VS16 selector, uniseg should verify that the previous rune is indeed an emoji.

rivo commented 8 months ago

True. I did not consider the combination of these selectors with non-emoji characters. The latest commit should fix this.

Thanks for letting me know.