mmanela / diffplex

DiffPlex is Netstandard 1.0+ C# library to generate textual diffs.
Apache License 2.0
1.01k stars 184 forks source link

Side by side display issue #41

Open chris91010 opened 5 years ago

chris91010 commented 5 years ago

I have been looking at the results that the side by side diff model returns. I have noticed that the position of modifications / imaginary lines on both sides appears incorrect if you have deleted text followed immediately by modified text.

Example: Line 2 from the left hand side has been deleted on the right, line 3 has been modified. This is line 1 | This is line 1 This is line 2 | This is line 3 With Modification This is line 3 | This is line 4 This is line 4 |

The model result for new text shows that line 1 was unchanged, line 2 was modified (not deleted), line 3 is marked as imaginary and line 4 is unchanged. It appears that in this scenario it should show line 2 as the imaginary (deleted line) and line 3 as the modified line. This is how the result is provided on other tests, when a delete is not directly followed by a modification.

Any thoughts on the above would be appreciated.

Maximres commented 5 years ago

Is there any solution to this problem? @chris91010

mmanela commented 3 years ago

Thanks for reporting and describing the issue. The result I see is slightly different. I see it reported as this: image In this result I do agree ideally the imaginary line should be next to "This is line 2".

Right now what diffplex does is an iterative approach, it first does "line diffing" and sees the line 2,3 on the left text is deleted and position 2 on the right is inserted. Then is performs "word" diffing to compared the words of the "aligned" lines. That is the key issue. It is not using any smart intelligence in the aligning. It just takes the first matching position on both sides (this case position 2). You can see this logic here: https://github.com/mmanela/diffplex/blob/886608026e23db61dc81bddf854f119b8022faff/DiffPlex/DiffBuilder/SideBySideDiffBuilder.cs#L146

To make this better that code could be "smarter" and do a fitness test to see which way to align better based on maximum similarity.