Closed ShoshinNikita closed 10 months ago
There's another possible fix: https://github.com/ShoshinNikita/go-diff/commit/c8591cf97f43b198b269258a0c3f3b1fc07990df. It is also based on the previous approach but uses map[rune]string
instead of []string
to store lines. It doesn't require additional 32KB of memory but breaks the backward compatibility for methods DiffLinesToChars
, DiffLinesToRunes
and DiffCharsToLines
.
I'm also hitting this regression. Because the lineHash map is no longer shared, the diff assigns different chars/runes to lines that are the same.
Hi! We (the gopls team) spoke to @sergi and will help review and test this PR. Specifically, I'll do my best to review the change, and we'll test the fix in gopls. Will probably need a couple days.
Thanks for helping @findleyr and team!!
@ShoshinNikita
The complete fix will require a lot of changes.
By this you mean modifying the diff algorithm to operate on []int, rather than string, right?
@findleyr
I am not very familiar with the codebase. I just could determine the cause of the regressions and fix #89 with another approach based on code in v1.1.0 (the second commit in this PR partially reverts the previous fix). So, the real diff is https://github.com/sergi/go-diff/compare/v1.1.0...d20955afb55ced3a19b28c588595ed52a47c2afe, lines 431-465 (or just the last commit).
I think using []int
instead of string
can fix the issue completely. But it will break the backward compatibility. However, as I said before, I am not familiar with the code. So, I could be wrong.
I would suggest you guys look at my PR #128. It fixes a serious panic issue. It may also fix the issue you were disucssing (I'm not sure).
This PR can probably be closed now since my PR which fixes the same thing was merged yesterday https://github.com/sergi/go-diff/pull/136.
This PR partially reverts commits https://github.com/sergi/go-diff/commit/db1b095f5e7c905e196ff6bfd56189a41aa76309 - https://github.com/sergi/go-diff/commit/0a651d56613f9de4bed8b9c4769b776ef168bfca and fixes the issue #89 with a different approach. The current solution doesn't work well with
(*DiffMatchPatch).diffMainRunes
method because array indexes in the index string occupy multiple runes.This fix is based on the previous approach. But elements of the
lineArray
with indexes from0xD800
(55296) to0xDFFF
(57343) are skipped because runes in this range are invalid. It requires additional 32KB of memory ([2048]string{}
) but allows us to safely encode line indexes in a string.It doesn't completely fix #89 but increases the panic limit to ~1114111 lines (
0x10FFFF
is the maximum valid unicode code point). The complete fix will require a lot of changes. At the same time the current approach has a bug. So, I believe it's better to use this fix.Fixes #115