sergi / go-diff

Diff, match and patch text in Go
MIT License
1.81k stars 207 forks source link

Fix the regressions introduced in the fix for #89 #120

Closed ShoshinNikita closed 10 months ago

ShoshinNikita commented 3 years ago

This PR partially reverts commits https://github.com/sergi/go-diff/commit/db1b095f5e7c905e196ff6bfd56189a41aa76309 - https://github.com/sergi/go-diff/commit/0a651d56613f9de4bed8b9c4769b776ef168bfca and fixes the issue #89 with a different approach. The current solution doesn't work well with (*DiffMatchPatch).diffMainRunes method because array indexes in the index string occupy multiple runes.

This fix is based on the previous approach. But elements of the lineArray with indexes from 0xD800 (55296) to 0xDFFF (57343) are skipped because runes in this range are invalid. It requires additional 32KB of memory ([2048]string{}) but allows us to safely encode line indexes in a string.

It doesn't completely fix #89 but increases the panic limit to ~1114111 lines (0x10FFFF is the maximum valid unicode code point). The complete fix will require a lot of changes. At the same time the current approach has a bug. So, I believe it's better to use this fix.

Fixes #115

ShoshinNikita commented 3 years ago

There's another possible fix: https://github.com/ShoshinNikita/go-diff/commit/c8591cf97f43b198b269258a0c3f3b1fc07990df. It is also based on the previous approach but uses map[rune]string instead of []string to store lines. It doesn't require additional 32KB of memory but breaks the backward compatibility for methods DiffLinesToChars, DiffLinesToRunes and DiffCharsToLines.

jlao commented 3 years ago

I'm also hitting this regression. Because the lineHash map is no longer shared, the diff assigns different chars/runes to lines that are the same.

findleyr commented 3 years ago

Hi! We (the gopls team) spoke to @sergi and will help review and test this PR. Specifically, I'll do my best to review the change, and we'll test the fix in gopls. Will probably need a couple days.

sergi commented 3 years ago

Thanks for helping @findleyr and team!!

findleyr commented 3 years ago

@ShoshinNikita

The complete fix will require a lot of changes.

By this you mean modifying the diff algorithm to operate on []int, rather than string, right?

ShoshinNikita commented 3 years ago

@findleyr

I am not very familiar with the codebase. I just could determine the cause of the regressions and fix #89 with another approach based on code in v1.1.0 (the second commit in this PR partially reverts the previous fix). So, the real diff is https://github.com/sergi/go-diff/compare/v1.1.0...d20955afb55ced3a19b28c588595ed52a47c2afe, lines 431-465 (or just the last commit).

I think using []int instead of string can fix the issue completely. But it will break the backward compatibility. However, as I said before, I am not familiar with the code. So, I could be wrong.

iambus commented 2 years ago

I would suggest you guys look at my PR #128. It fixes a serious panic issue. It may also fix the issue you were disucssing (I'm not sure).

kdarkhan commented 1 year ago

This PR can probably be closed now since my PR which fixes the same thing was merged yesterday https://github.com/sergi/go-diff/pull/136.