mmanela / diffplex

DiffPlex is Netstandard 1.0+ C# library to generate textual diffs.
Apache License 2.0
990 stars 183 forks source link

When two strings end with the same words, CreateWordDiffs includes the first common word as an insertion and deletion #91

Closed tombogle closed 2 years ago

tombogle commented 2 years ago

DiffPlex.Differ.Instance.CreateWordDiffs("Am I a substring?", " a substring?", false, false, new [] ' ') yields a result with a single DiffBlock: DeleteCountA = 3 DeleteStartA = 0 InsertCountB = 1 InsertStartB = 0

I think the correct result should be DeleteCountA = 2 DeleteStartA = 0 InsertCountB = 0 InsertStartB = 0

tombogle commented 2 years ago

If I remove the leading space from the second argument, I get an even more bizarre result: DiffPlex.Differ.Instance.CreateWordDiffs("Am I a substring?", "a substring?", false, false, new [] ' ') yields a result with a single DiffBlock: DeleteCountA = 4 DeleteStartA = 0 InsertCountB = 0 InsertStartB = 0

mmanela commented 2 years ago

Thanks for the details. Part of this is by design (although its a bit confusing) and part of it is a bug that you found.

The way the chunking works in this case is that is will break up the words by those delimiters (in this case a space) but it will still include the delimiters as "words" in the diff. That is why the count is more than you expect, since it includes the spaces. The delimiters are just telling it how to align the words but you may still want to see if the delimiters are different. A future change could be to add a flag to ignore those.

As for the bug, you did find an issue where the string starts with a delimiter. I pushed a fix (ff1f16ba68442f6eed660de6f2fb2ccdc1d64d95) for this.

tombogle commented 2 years ago

Thanks for the explanation and the fix.