Closed GoogleCodeExporter closed 8 years ago
Behaviour confirmed. As you pointed out, the two options are tied as the
shortest possible solution. In this case it happens it finds the ugly one
first.
If you want an output that is easy to read, the intent is to pass the raw diff
to diff_cleanupSemantic() which has some reasonably effective heuristics to
clean up the results. However in this case it fails to detect the commonality.
It is worth noting that if one reverses text1 and text2 it behaves perfectly.
The issue appears to be an oversight in diff_cleanupSemantic(). Currently it
checks for this case:
# Find any overlaps between deletions and insertions.
# e.g: <del>abcxxx</del><ins>xxxdef</ins>
# -> <del>abc</del>xxx<ins>def</ins>
The case of
<del>xxxabc</del><ins>xxxdef</ins>
and
<del>abcxxx</del><ins>defxxx</ins>
is already handled by the diff algorithm itself.
What's missing is this heuristic:
<del>xxxabc</del><ins>defxxx</ins>
-> <ins>def</ins>xxx<del>abc</del>
Working on it...
[Nice bug report BTW. You know your stuff.]
Original comment by neil.fra...@gmail.com
on 3 Nov 2011 at 6:13
Code complete. Out for review...
Original comment by neil.fra...@gmail.com
on 4 Nov 2011 at 2:06
...complete. Thanks for pointing out this sub-optimal output!
Original comment by neil.fra...@gmail.com
on 9 Nov 2011 at 9:33
Original issue reported on code.google.com by
m...@dixon.se
on 3 Nov 2011 at 1:19