Closed jvalrog closed 3 months ago
Hmmmm...
I would need to see the texts being diffed and carefully work through what happened, but this sounds wrong to me. To be clear, it is normal for the diff of y vs x to not simply be the same as the diff of x vs y but with additions turned into deletions and deletions into additions; very roughly speaking, when faced with a tie between diffs of equal cost, the Myers diff algorithm prefers the option that puts deletions as early as possible, and this creates an asymmetry where the diff depends on your order of parameters. For instance, if you diff abcd
against acbd
with diffChars
, the b
will deleted and a new b
inserted after the c
, but if you reverse the order of parameters, it'll instead be the c
that gets deleted and reinserted.
What you're showing is more than just the expected kind of asymmetry, though - it's a change in the overall count of deleted and added tokens. I don't think that should happen. The stats from the first diff imply that the newBackup.data
has two fewer lines in it than oldBackup.data
, while the stats from the second diff imply that the two files contain the same number of lines!
I wonder if this odd output somehow relates to the bug alluded to in the release notes for the not-yet-released jsdiff 6.0.0:
diffLines with ignoreWhitespace: true will no longer ignore the insertion or deletion of entire extra lines of whitespace at the end of the text. Previously, these would not show up as insertions or deletions, as a side effect of a hack in the base diffing algorithm meant to help ignore whitespace in diffWords.
I'd appreciate it if you could share the actual content of the files that you're diffing (or, if they've got confidential stuff in them, some kind of slimmed down or censored example that exhibits the same behaviour) so I can try to figure out what's going on.
sure! I'll remove any sensible information and share it later
Was able to reproduce it, I've attached a zip with 2 backups and a simple test.js script.
The execution shows this:
612
608
609
612
[removed link]
could you reproduce the problem?
Aha! It took me a long time to spot it, but the bug is in your test script rather than jsdiff, here. Each change object has a .count
property indicating how many added/removed/unchanged tokens it represents, so to count the number of changes you need to sum the .count properties, not just count the change objects as you're doing. Here's a fixed version of the script:
fs = require('fs');
diff = require('./lib');
backup1File = fs.readFileSync('/home/mark/Downloads/test/backup1.txt');
backup2File = fs.readFileSync('/home/mark/Downloads/test/backup2.txt');
backup1 = backup1File.toString();
backup2 = backup2File.toString();
changes = diff.diffLines(backup1, backup2);
console.log(
changes.filter((c) => c.added).reduce((total, c) => total + c.count, 0)
);
console.log(
changes.filter((c) => c.removed).reduce((total, c) => total + c.count, 0)
);
changes = diff.diffLines(backup2, backup1);
console.log(
changes.filter((c) => c.added).reduce((total, c) => total + c.count, 0)
);
console.log(
changes.filter((c) => c.removed).reduce((total, c) => total + c.count, 0)
);
Output now has the expected symmetry:
mark@fractal:~/jsdiff$ node test.js
2263
2165
2165
2263
thank you! I missed that point then, my bad.
We are calling diffLines like this:
result is:
added: 465 deleted: 467
if we change the order of the parameters we get:
added: 466 deleted: 466
Is this normal?