tsboh / google-diff-match-patch

Automatically exported from code.google.com/p/google-diff-match-patch
Apache License 2.0
0 stars 0 forks source link

Mistake in diff_cleanupMerge algorithm in diff_match_patch.cpp #31

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
In diff_match_patch::diff_cleanupMerge the 'if (count_delete != 0 && 
count_insert != 0)' is always true because it is proceeded by 'while 
(count_delete-- > 0)' & 'while (count_insert-- > 0)'.
I think the test was supposed to be before the variables were decremented.

Luckily (or not) this only results in consuming extra cycles.

This same bug appears in the Java version, but not in the C#, JavaScript, Lua 
or Python versions.

Patch for C++ attached.

Original issue reported on code.google.com by chris...@gmail.com on 4 Nov 2010 at 1:36

Attachments:

GoogleCodeExporter commented 8 years ago
Nice catch!  It looks like that bug has been in the code since the initial Java 
version in 2007, and was later faithfully ported to C++.  This function has 
comprehensive unit testing, but no amount of testing would have caught that.

While in that function I also noticed that we are rebuilding [Equality, 
Insertion, Equality] diffs for no reason.  So I've added a shortcut for 
single-edits sandwiched between two equalities.

The patch has gone out for overnight review.

Original comment by neil.fra...@gmail.com on 5 Nov 2010 at 7:38

GoogleCodeExporter commented 8 years ago
The changes were commited to Subversion a few hours ago.
  http://code.google.com/p/google-diff-match-patch/source/detail?r=72
Thanks for caching this.

There's another small optimization coming on Monday for C# and JavaScript, so I 
won't bother updating the download package until then.

Original comment by neil.fra...@gmail.com on 6 Nov 2010 at 6:21

GoogleCodeExporter commented 8 years ago
Neil,

Unfortunately your change in r72 appears to cause test "diff_main: Overlap #1." 
to fail.

It's the 'if (count_delete + count_insert > 1)' which is not equivalent to
the old 'if (count_delete != 0 || count_insert != 0)'.

Original comment by chris...@gmail.com on 6 Nov 2010 at 6:01

GoogleCodeExporter commented 8 years ago
I cannot replicate the error you report.  All unit tests pass in all languages:

diff_main: Overlap #1. OK
diff_main: Overlap #2. OK
diff_main: Overlap #3. OK
[...]
All tests passed.
Total time: 1297 ms
Done.

Could you double-check that the error is real?

Original comment by neil.fra...@gmail.com on 8 Nov 2010 at 8:36

GoogleCodeExporter commented 8 years ago
Neil,

I'm terribly sorry.  You are absolutely correct - r72 does NOT cause a test 
failure.

Let me explain: I'd made a change that asserted that `text_delete` & 
`text_insert` were already empty at the bottom of the `case` statement & so 
there was no need for `text_delete = ""; text_insert = "";`.  This was true 
before r72.  Unfortunately I neglected to test my debug build with that change 
from r72 so the asserts weren't triggered.

Why did I make that change & why did I not suspect it?
Well, it's because I don't have/can't use Qt & I've ported the C++ 
diff-match-patch to use the standard C++ library, so it is just one of many 
changes.  I hope to submit my C++ version (cpp-std) to you soon for possible 
inclusion in diff-match-patch.

In fact I'm quite pleased that that change in r72 does work as it provides a 
nice performance improvement.
I've spent quite some time optimising my C++ conversion & it is now up to 4 
times as fast as the initial conversion.
And, of course it passes all the tests.  I've also added a couple of switches 
to the test program to provide some performance metrics.
Additionally, I've added another switch (currently not for submission) that 
processes a diff/patch file to produce HTML.  It's based on the JavaScript 
version in svnX <svnx.googlecode.com> .
[It converts a 91,500 line diff (from a user) to 29.2MB of HTML in as little as 
14 secs, even on my ageing machine.]

My conversion is pretty much complete, I just need to do some tidying up & test 
it with one more compiler.
I'll send you more details when it's ready.

Thanks for a very useful resource.

Regards,

Chris

Original comment by chris...@gmail.com on 9 Nov 2010 at 6:50