google / diff-match-patch

Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.
Apache License 2.0
7.25k stars 1.09k forks source link

patch_fromText and lines breaks issue in Python #157

Open weeger opened 2 months ago

weeger commented 2 months ago

Hello,

I am looking to utilize this fantastic library for applying patches to a file, but I'm encountering issues with broken line breaks. I have attempted to use the patch_make function to compare the behavior, and it operates effectively when the diff is generated by dmp. Could someone please guide me in identifying the potential issues in my diff file?

The wrong result :

"express": "^4.18.2",    "lodash": "^4.17.21",

The expected one :

 "express": "^4.18.2",
 "lodash": "^4.17.21",

This is my simple test code :

from diff_match_patch import diff_match_patch

original_json = '''{
  "type": "module",
  "dependencies": {
    "ejs": "^3.1.9",
    "express": "^4.18.2",
    "puppeteer": "^21.7.0"
  }
}'''

expected_json = '''{
  "type": "module",
  "dependencies": {
    "ejs": "^3.1.9",
    "express": "^4.18.2",
    "lodash": "^4.17.21",
    "puppeteer": "^21.7.0"
  }
}'''

# The unidiff string, which is an exact clone of what dmp generates for the same changes
diff_str = '''@@ -82,16 +82,42 @@
 .18.2",
+    "lodash": "^4.17.21",
     "pup'''

print("____________ BROKEN EXAMPLE")
dmp = diff_match_patch()
patches = dmp.patch_fromText(diff_str)
patch_text = dmp.patch_toText(patches)
new_text, _ = dmp.patch_apply(patches, original_json)
print(patch_text)
print(new_text)

print("____________ WORKING EXAMPLE")
patches = dmp.patch_make(original_json, expected_json)
patch_text = dmp.patch_toText(patches)
new_text, _ = dmp.patch_apply(patches, original_json)
print(patch_text)
print(new_text)

And the result

____________ BROKEN EXAMPLE
@@ -82,16 +82,42 @@
 .18.2%22,
+    %22lodash%22: %22%5E4.17.21%22,
     %22pup

{
  "type": "module",
  "dependencies": {
    "ejs": "^3.1.9",
    "express": "^4.18.2",    "lodash": "^4.17.21",
    "puppeteer": "^21.7.0"
  }
}
____________ WORKING EXAMPLE
@@ -82,16 +82,42 @@
 .18.2%22,%0A
+    %22lodash%22: %22%5E4.17.21%22,%0A
     %22pup

{
  "type": "module",
  "dependencies": {
    "ejs": "^3.1.9",
    "express": "^4.18.2",
    "lodash": "^4.17.21",
    "puppeteer": "^21.7.0"
  }
}

We can clearely see that dmp adds a "%0A" (an encoded \n) in the generated patch, which is missing when creating it from text.

Version info :

diff-match-patch 20230430
system Ubuntu
python 3.10.12
dmsnell commented 2 months ago

@weeger did you confirm if this works as expected in the other languages? in JavaScript, Java, Objective C, or any others?

weeger commented 2 months ago

Same problem, just tested in Javascript :

// Import the diff-match-patch library
const DiffMatchPatch = require('diff-match-patch');
const dmp = new DiffMatchPatch();

// Original JSON string
const originalJson = `{
  "type": "module",
  "dependencies": {
    "ejs": "^3.1.9",
    "express": "^4.18.2",
    "puppeteer": "^21.7.0"
  }
}`;

// Expected JSON string with a new dependency added
const expectedJson = `{
  "type": "module",
  "dependencies": {
    "ejs": "^3.1.9",
    "express": "^4.18.2",
    "lodash": "^4.17.21",
    "puppeteer": "^21.7.0"
  }
}`;

// Unidiff string representing the change
const diffStr = '@@ -82,16 +82,42 @@\n .18.2",\n+    "lodash": "^4.17.21",\n     "pup';

console.log("____________ BROKEN EXAMPLE");
// Converting the diff string to patches
let patches = dmp.patch_fromText(diffStr);
// Converting patches back to a diff string for display
let patchText = dmp.patch_toText(patches);
// Applying patches to the original JSON string
let [newText, results] = dmp.patch_apply(patches, originalJson);
console.log(patchText);
console.log(newText);

console.log("____________ WORKING EXAMPLE");
// Creating patches directly from original and expected JSON strings
patches = dmp.patch_make(originalJson, expectedJson);
// Converting patches to a diff string for display
patchText = dmp.patch_toText(patches);
// Applying these patches to the original JSON
[newText, results] = dmp.patch_apply(patches, originalJson);
console.log(patchText);
console.log(newText);

Result

____________ BROKEN EXAMPLE
@@ -82,16 +82,42 @@
 .18.2%22,
+    %22lodash%22: %22%5E4.17.21%22,
     %22pup

{
  "type": "module",
  "dependencies": {
    "ejs": "^3.1.9",
    "express": "^4.18.2",
    "lodash": "^4.17.21",    "puppeteer": "^21.7.0"
  }
}
____________ WORKING EXAMPLE
@@ -82,16 +82,42 @@
 .18.2%22,%0A
+    %22lodash%22: %22%5E4.17.21%22,%0A
     %22pup

{
  "type": "module",
  "dependencies": {
    "ejs": "^3.1.9",
    "express": "^4.18.2",
    "lodash": "^4.17.21",
    "puppeteer": "^21.7.0"
  }
}
dmsnell commented 2 months ago

there one thing that stands out: in the broken example we have a patch file where the newline is percent-encoded as %0A. it's becoming part of the diff, but the patch tool already works on the line-level.

makes me wonder if there's something about the diff output that patch takes that makes an assumption about a single trailing newline vs. many. as in, maybe diff-match-patch should strip a single trailing newline from its output before percent-encoding it.

Emasoft commented 2 months ago

@dmsnell I just encountered this issue myself. This bug is troublesome. Do you plan to fix it and publish the fix in your improved fork? ( https://github.com/dmsnell/diff-match-patch )? That would be great. Can't believe Google abandoned this project... 😔

dmsnell commented 2 months ago

@Emasoft it's doubtful I'll fix this myself on any timeline someone else wants, but I'm willing to work with people who propose fixes and get it merged. I don't have the capacity do everything myself.

17351269 commented 1 month ago

Lang