Open schroederc opened 1 year ago
Here's a rough test case:
func TestLineDiff(t *testing.T) {
t.Run("Chars", func(t *testing.T) {
before := `1
2
3
4
5
6
7
8
9
`
after := `10
`
dmp := diffmatchpatch.New()
txt1, txt2, lines := dmp.DiffLinesToChars(string(before), string(after))
diff := dmp.DiffMain(txt1, txt2, false)
diff = dmp.DiffCharsToLines(diff, lines)
var foundBefore, foundAfter string
for _, d := range diff {
switch d.Type {
case eq:
foundBefore += d.Text
foundAfter += d.Text
case del:
foundBefore += d.Text
case ins:
foundAfter += d.Text
}
}
if foundBefore != before {
t.Errorf("Expected before %q; found %q", before, foundBefore)
}
if foundAfter != after {
t.Errorf("Expected after %q; found %q", after, foundAfter)
}
})
t.Run("Runes", func(t *testing.T) {
before := `1
2
3
4
5
6
7
8
9
`
after := `10
`
dmp := diffmatchpatch.New()
txt1, txt2, lines := dmp.DiffLinesToRunes(string(before), string(after))
diff := dmp.DiffMainRunes(txt1, txt2, false)
diff = dmp.DiffCharsToLines(diff, lines)
var foundBefore, foundAfter string
for _, d := range diff {
switch d.Type {
case eq:
foundBefore += d.Text
foundAfter += d.Text
case del:
foundBefore += d.Text
case ins:
foundAfter += d.Text
}
}
if foundBefore != before {
t.Errorf("Expected before %q; found %q", before, foundBefore)
}
if foundAfter != after {
t.Errorf("Expected after %q; found %q", after, foundAfter)
}
})
}
I sent out https://github.com/sergi/go-diff/pull/141 to revert the implementation to the limited, but not incorrect, rune approach. Some more fundamental changes would probably be preferable if someone has more time to work on it.
Ran into this as well. I don't need to diff large chunks (context: https://github.com/sergi/go-diff/issues/89#issuecomment-591376325) so I downgraded to v1.1.0, which seems to be, still to this day, widely used (i.e. go-git).
Is there any progress on this issue?
https://github.com/sergi/go-diff/commit/db1b095f5e7c905e196ff6bfd56189a41aa76309 introduces a bug in its change from
diffLinesToRunesMunge
todiffLinesToStringsMunge
. Since each line is represented by 1 or more ascii characters, it's possible for the diffing algorithm to split hashed lines incorrectly whereas before the rune indexed lines were indivisible.For instance,
DiffLinesToChars
could return hashed strings such as:DiffMain
may then split the leading42
such as:And the resulting diff after hydration is completely wrong.
This affects users of the
DiffLinesTo*
APIs as well as any user that passestrue
forchecklines
inDiffMain
orDiffMainRunes
.