sergi / go-diff

Diff, match and patch text in Go
MIT License
1.81k stars 207 forks source link

fix DiffCleanupSemantic #108

Closed pakohan closed 3 years ago

pakohan commented 4 years ago

currently, DiffCleanupSemantic only works for Ascii chars. In our company we've got Chinese chars as well, so the DiffCleanupSemantic does not work there since len("新") != len([]rune("新")).

See: https://play.golang.org/p/oTXcxo0gH1R

The Python lib has the same behaviour like the fixed version.

sergi commented 4 years ago

Hey, thanks for this PR. I'd rather use utf8.RuneCountInString(s) than len([]rune(s)), though. The former doesn't create (and discard) a slice from the string.

More details about it here here: https://www.reddit.com/r/golang/comments/8d4eyf/utf8runecountinstring_is_13x_faster_than_lenrunes/dxk7flk/