kpdecker / jsdiff

A javascript text differencing implementation.
BSD 3-Clause "New" or "Revised" License
7.69k stars 491 forks source link

Adding "merge" flag to all diffMethods improve diff readability #528

Open EllAchE opened 2 weeks ago

EllAchE commented 2 weeks ago

I'd like to suggest adding a "merge" option to the default diff methods to make diffs more readable/useful. Here's a diffWords & diffChars example visualizing current behavior vs my proposed addition:

image

With the flag, additions and removals are now "merged" and separated by unchanged parts. Here's the code used to build these examples

  function coloredSpan(text: string, color: string) {
    return `<${color === "red" ? "s" : "span"} style="color:${color}">${text}</${color === "red" ? "s" : "span"}>`;
  }

  let outputString = "";
  let changes = Diff.diffWords(oldState, newState);

  if (shouldMerge) {
    changes = mergeChanges(changes)
  }

  changes.forEach((part: any) => {
    outputString += coloredSpan(part.value, part.added ? "green" : part.removed ? "red" : "black");
  });

And the code to build the merged diff array

function mergeChanges(changes: Diff.Change[]) {
  // create accumulators for the added and removed text. Once a neutral part is encountered, merge the diffs and reset the accumulators
  let addedText = "";
  let addedCount = 0;
  let removedText = "";
  let removedCount = 0;
  let mergedChanges = [];

  for (const part of changes) {
    if (part?.added) {
      addedText += part.value;
      addedCount += part.count ?? 0;
    } else if (part?.removed) {
      removedText += part.value;
      removedCount += part.count ?? 0;
    } else if (part.value.length <= 5) {
      // we ignore small unchanged segments (<= 4 characters), which catches most whitespace too
      addedText += part.value;
      removedText += part.value;
    } else {
      // if the part is not added or removed, merge the added and removed text and append to the diff alongside neutral text
      mergedChanges.push({ value: removedText, removed: true, count: removedCount });
      mergedChanges.push({ value: addedText, added: true, count: addedCount });
      mergedChanges.push(part);

      addedText = "";
      addedCount = 0;
      removedText = "";
      removedCount = 0;
    }
  }

  // after exiting the loop we might have ended with some added or removed text that needs to be appended
  if (addedText) {
    mergedChanges.push({ value: addedText, added: true, count: addedCount });
  }
  if (removedText) {
    mergedChanges.push({ value: removedText, removed: true, count: removedCount });
  }

  return mergedChanges;
}

Notes: The current implementation is for discussion, I realize iterating the diff array again & making a copy are suboptimal. Potentially a lower-level change, i.e. adding a new algorithm for the diffs depending on the flag would be the best long term decision.

The conditions under which unchanged separators are ignored would also need to be reconsidered, i.e. the <=5 doesn't make sense diffChars.

If this is already supported without extending the base class happy to be pointed in that direction! Or if this feature is already under consideration.