rivo / uniseg

Unicode Text Segmentation, Word Wrapping, and String Width Calculation in Go
MIT License
570 stars 60 forks source link

Make runeWidth a public function #48

Closed mikelorant closed 5 months ago

mikelorant commented 6 months ago

There are many modules using uniseg for the StringWidth function but also could use a RuneWidth function. Could this function be made accessible so there is no need for incorrect implementations to be used.

Examples of another module providing both these functions. RuneWidth - https://github.com/mattn/go-runewidth/blob/master/runewidth.go#L307 StringWidth - https://github.com/mattn/go-runewidth/blob/master/runewidth.go#L322

As you can see, the StringWidth function relies of uniseg. Any application using go-runewidth for StringWidth could swap directly over to uniseg for the same capability. Comparing the benchmarks of the functions shows nearly 2x faster performance for uniseg.

Unfortunately, runeWidth is not public so I am unable to compare the performance. I do not see any negatives with making this function public. Is this something you would consider changing?

rivo commented 6 months ago

The implementation of a public RuneWidth method would simply look like this:

return StringWidth(string(r))

That's because a single code point (rune) is not always a complete character. That's the whole basis of the uniseg package. The README explains this in detail. One example given there is the country flag emoji. These emojis must always consist of two runes. What's the width of only one of these runes then? It's basically undefined because it's an incomplete grapheme cluster. The mattn/go-runewidth package gets this fundamentally wrong and I have tried for a long time to help them do it right but there was never much interest in following up on it.

The private runeWidth method has a different purpose. It only works in the context of the other methods of this package who call runeWidth multiple times to calculate a final width of a character. (It also takes a second argument which cannot simply be left out.)

My initial gut reaction was to say "no" because (1) such a method would not make sense and (2) you could always call StringWidth(rune(r)) directly if you really needed to.

But if it really helps users who want to switch over from mattn/go-runewidth, I will consider it.

mikelorant commented 5 months ago

Thanks for the clarification. I was actually mid reply when you posted your response and also realised many of the problems you just mentioned.

I withdraw my request for this function to be made public (and will close this issue) as it makes no sense at all after your clear explanation.

I'll direct my attention at looking how to get other packages to stop calculating rune width where they should instead be seeing the bigger picture of the string width.