rivo / tview

Terminal UI library with rich, interactive widgets — written in Golang
MIT License
11.17k stars 574 forks source link

WordWrap(…) exceeds line length with punctuation marks #828

Closed der-lyse closed 1 year ago

der-lyse commented 1 year ago

Hi, first of all, thank you very much for this awesome library! :-)

I noticed that WordWrap(…) does exceed the specified line length when a punctuation mark comes right after the line length. It's a bit tricky to explain, the following code illustrates this in more detail:

go.mod:

module main

go 1.19

require github.com/rivo/tview v0.0.0-20230307144320-cc10b288e304

main.go:

package main

import (
    "fmt"
    "github.com/rivo/tview"
)

func main() {
    for i, line := range tview.WordWrap("Text with punctuation mark.", 4) {
        fmt.Printf("#%d (%d): '%s'\n", i, len(line), line) // since it's all ASCII, len is a sufficient shortcut
    }
}

This program prints:

$ go run main.go 
#0 (4): 'Text'
#1 (4): 'with'
#2 (4): 'punc'
#3 (4): 'tuat'
#4 (3): 'ion'
#5 (5): 'mark.'

The last line is in fact five characters long, but the specified width was only four. I tested with period (.), comma (,), semicolon (;), colon (:), exclamation mark (!), question mark (?) and plus (+) and they all result in the same behavior. There might be more characters. It doesn't matter that this input string ends with the period, the period could also be somewhere in the middle of the string and the resulting line length is exceeded there, too.

Is this intended? I would have expected that the period or any of the other stated characters above are placed on their own line. Just like what happens if the period would be a regular character, number, dollar sign ($), tilde (~) etc. So I hoped for this:

…
#4 (3): 'ion'
#5 (4): 'mark'
#6 (1): '.'

Reducing the line length to three and the example code works flawlessly, the last two lines then are mar and k. as I expect them to be. So I suspect that it has to be something to do with the punctuation mark coming immediately after the line length is maxed out. I didn't look in the code, though.

rivo commented 1 year ago

The WordWrap() function uses a very simple algorithm to determine possible line breaks. And as you found out, it also has bugs. I've been planning to rewrite this function for a while now. The new implementation will use uniseg's line breaking functionality which follows the Unicode standard and thus also works for languages other than English.

I hope I will get some time soon to do this. But I will likely not fix this "old" implementation anymore.

Two more notes:

der-lyse commented 1 year ago

Thank you very much! I'll have a look at them.