tangentialism / google-diff-match-patch

Automatically exported from code.google.com/p/google-diff-match-patch
Apache License 2.0
0 stars 0 forks source link

Inconsistent Patches #69

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Neil, I'm a huge fan of your work, and regularly stalk your projects because 
you are always tackling (more than) interesting problems and being fearless to 
solve them. Thank you so much.

At: 
http://neil.fraser.name/software/diff_match_patch/svn/trunk/demos/demo_patch.htm
l (JS)

GIVEN:
t1a: "Hello world!"
t2a: "Hellfo world!"
t1a: "Helloa world!"
result: "Hellfoa world!"

THEN:
t1a: "Hello world!"
t2a: "sHellfo world!"
t1a: "Helloa world!"
result: "sHellofa world!"
expected: "sHellfoa world!"

The placement of the 'f' seems to precariously shift from 'fo ' (src) to 'ofa ' 
(destination), rather than stay in place, 'fo ' to 'foa '. This does not happen 
in the first example, but does happen in the second because the leading 's' has 
shifted the 'f' to be at the same index as the 'a' in the source to be patched 
against. Agreed, when similar strings have differences at the same index, the 
patched source diff should take priority on that index, as the following 
examples show, but which the previous fails.

t1a: "Hello world!"
t2a: "Hellos world!"
t1a: "Helloa world!"
result: "Helloas world!"
('s' and 'a' at same index, as expected, 'a' retains its index while 's' is 
shifted)

t1a: "Hello world!"
t2a: "Hellsfo world!"
t1a: "Helloa world!"
result: "Hellsfoa world!"
(placing the 's' with the 'f' retains the expected result, in contrast to the 
wrong result [scroll up] when the 's' and 'f' are apart)

t1a: "Hello world!"
t2a: "Helslfo world!"
t1a: "Helloa world!"
result: "Helslfoa world!"
(in fact, when split only by a distance of 1, the 's' and 'f' still retain 
expected position, but everything greater than 1 returns the wrong result)

This seems to be an odd anomaly with the patch operation, which has given me 
some grief while I've been writing my own (distributed/p2p) HTML content sync 
algorithm (although much less grief than when I had been previously using my 
own diff algo).
Note: this 'issue' is an observational one, and I do not know if it is 
objective to the patching process, or what side effects would result from a 
'fix'. I am more interested in hearing your philosophy on the matter. I have 
only briefly dived into your code, and will be doing more so soon to 'fix' it 
myself (at least, for my consistency needs, or write my own algo). When I do, 
I'll run the tests, check for side effects, and post the results here.

Thanks again! I'd love to hear your thoughts on it (it'd make my day, as a 
geeky fan).

Original issue reported on code.google.com by Aqu...@gmail.com on 15 May 2012 at 10:57

GoogleCodeExporter commented 9 years ago
*all the examples should have read as this format:
t1a:
t2a:
t1b:
result:

(t1a is shown twice, in all of them, when it should be 't1b' instead)

Original comment by Aqu...@gmail.com on 15 May 2012 at 11:01