rivo / uniseg

Unicode Text Segmentation, Word Wrapping, and String Width Calculation in Go
MIT License
581 stars 60 forks source link

More improved performance #5

Closed Code-Hex closed 3 years ago

Code-Hex commented 4 years ago

This PR will be improving performance when called GraphemeClusterCount function.

benchmark code.

func BenchmarkCountBefore(b *testing.B) {
    for i := 0; i < b.N; i++ {
        for _, bcase := range testCases {
            g := beforeNewGraphemes(bcase.original)
            var n int
            for g.Next() {
                n++
            }
        }
    }
}

func beforeNewGraphemes(s string) *Graphemes {
    g := &Graphemes{}
    for index, codePoint := range s {
        g.codePoints = append(g.codePoints, codePoint)
        g.indices = append(g.indices, index)
    }
    g.indices = append(g.indices, len(s))
    g.Next()
    return g
}

func BenchmarkCountAfter(b *testing.B) {
    for i := 0; i < b.N; i++ {
        for _, bcase := range testCases {
            g := NewGraphemes(bcase.original)
            var n int
            for g.Next() {
                n++
            }
        }
    }
}

Result

go test -benchmem -run="^$" github.com/rivo/uniseg -bench .
goos: darwin
goarch: amd64
pkg: github.com/rivo/uniseg
BenchmarkCountBefore-8              5000            247797 ns/op           96392 B/op       3458 allocs/op
BenchmarkCountAfter-8              10000            199063 ns/op          105544 B/op       1865 allocs/op
Code-Hex commented 4 years ago

@rivo Review please~~~~ πŸ™πŸ™ πŸ™ πŸ™ πŸ™ πŸ™ πŸ™

System-Glitch commented 4 years ago

Any news on this PR? It is a nice optimization.

dolmen commented 4 years ago

Could be improved more: use local variables for slices and create the Grapheme object only after the loop.

dolmen commented 4 years ago

This performance optimization uses more memory. To allocate the right amount (loosing some speed), use utf8.RuneCountInString instead of len.

rivo commented 3 years ago

I merged #8 instead. Hope that's ok with you.