google / diff-match-patch

Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.
Apache License 2.0
7.41k stars 1.1k forks source link

Diff cleanup semantic not working as expected #104

Open ash-lionell opened 3 years ago

ash-lionell commented 3 years ago

For the following input:

This text is bold, underlined, italicized, Arial and has a different color and size.

This text is not bold, not underlined, not italicized, Calibri and has the same color and size.

The diff cleanup returns:

[Diff(EQUAL,"This text is "), Diff(DELETE,"not "), Diff(EQUAL,"bold,"), Diff(DELETE," not"), Diff(EQUAL," underlined,"), Diff(DELETE," not"), Diff(EQUAL," italicized, "), Diff(DELETE,"Calibri"), Diff(INSERT,"Arial"), Diff(EQUAL," and has "), Diff(DELETE,"the same"), Diff(INSERT,"a different"), Diff(EQUAL," color and size.")]

As expected.

But for some other texts like:

This is a sample text.

This is a sample test.

The diff cleanup returns:

[Diff(EQUAL,"This is a sample te"), Diff(DELETE,"x"), Diff(INSERT,"s"), Diff(EQUAL,"t.")]

Whereas, it should've cleaned up the last word text/test and shown it as one DELETE/INSERT operation.

Observed this behavior in both Java and Javascript bindings.

mark1bean commented 1 year ago

I can confirm this issue. Here's my example in javascript:

var text1 = 'I ate a red apple.';
var text2 = 'I ate a green apple.';

var dmp = new diff_match_patch();
var diffs = dmp.diff_main(text1, text2);
dmp.diff_cleanupSemantic(diffs);

Result:

EQUAL   "I ate a "
INSERT  "g"
EQUAL   "re"
DELETE  "d"
INSERT  "en"
EQUAL   " apple."

Expected result:

EQUAL   "I ate a "
INSERT  "green"
DELETE  "red"
EQUAL   " apple."
ttpho commented 1 year ago

@NeilFraser Please support this issue. 🙇🏻