kpdecker / jsdiff

A javascript text differencing implementation.
BSD 3-Clause "New" or "Revised" License
7.75k stars 491 forks source link

Markdown comparison includes headings unnecessarily #452

Open adaboese opened 7 months ago

adaboese commented 7 months ago
it('diffs markdown (2)', () => {
  const a =
    '## AIMD generated content\n\nJust for fun, here is the same article scrambled using AIMD anti-AI detection techniques:';
  const b =
    "## AIMD generated content\n\nJust for fun, here is the same article scrambled using AIMD anti-AI detection techniques. If you're curious about building depth on related topics and establishing yourself as an authority, check out our detailed guide on [Understanding Topical Authority](https://aimd.app/blog/2023-12-26-revolutionizing-the-marketing-hierarchy-why-topical-authority-is-the-new-currency).";

  const diff = diffSentences(a, b);
});

Produces:

[
  {
    count: 1,
    added: undefined,
    removed: true,
    value: '## AIMD generated content\n' +
      '\n' +
      'Just for fun, here is the same article scrambled using AIMD anti-AI detection techniques:'
  },
  {
    count: 4,
    added: true,
    removed: undefined,
    value: '## AIMD generated content\n' +
      '\n' +
      "Just for fun, here is the same article scrambled using AIMD anti-AI detection techniques. If you're curious about building depth on related topics and establishing yourself as an authority, check out our detailed guide on [Understanding Topical Authority](https://aimd.app/blog/2023-12-26-revolutionizing-the-marketing-hierarchy-why-topical-authority-is-the-new-currency)."
  }
]

It is not clear why ## AIMD generated content\n is marked as removed.

adaboese commented 7 months ago

Interestingly, if I paste the same content into http://incaseofstairs.com/jsdiff/, I cannot replicate the issue.

## AIMD generated content

Just for fun, here is the same article scrambled using AIMD anti-AI detection techniques:
## AIMD generated content

Just for fun, here is the same article scrambled using AIMD anti-AI detection techniques. If you're curious about building depth on related topics and establishing yourself as an authority, check out our detailed guide on [Understanding Topical Authority](https://aimd.app/blog/2023-12-26-revolutionizing-the-marketing-hierarchy-why-topical-authority-is-the-new-currency).
adaboese commented 7 months ago

@kpdecker does http://incaseofstairs.com/jsdiff/ do something different than what's in my code?

adaboese commented 7 months ago

oh, it is not using sentences. It uses words diff. I will evaluate that.

adaboese commented 7 months ago

Looks like switching to words algorithm comes with tradeoffs though.

it.only('recognizes different sentences', () => {
  const a = 'This is a sentence. This is another sentence.';
  const b = 'This is a sentence. And this is a different sen.';
  const abDiff = diff(a, b);

  expect(abDiff).toStrictEqual([
    {
      count: 17,
      value: 'This is a sentence. This is ',
    },
    {
      added: true,
      count: 9,
      removed: undefined,
      value: 'a different',
    },
    {
      count: 9,
      value: ' sentence.',
    },
  ]);
});

This produces a fairly unreadable diff.

[
  { count: 9, value: 'This is a sentence. ' },
  { count: 1, added: undefined, removed: true, value: 'This' },
  { count: 1, added: true, removed: undefined, value: 'And' },
  { count: 1, value: ' ' },
  { count: 2, added: true, removed: undefined, value: 'this ' },
  { count: 2, value: 'is ' },
  { count: 1, added: undefined, removed: true, value: 'another' },
  { count: 1, added: true, removed: undefined, value: 'a' },
  { count: 1, value: ' ' },
  { count: 1, added: undefined, removed: true, value: 'sentence' },
  { count: 3, added: true, removed: undefined, value: 'different sen' },
  { count: 1, value: '.' }
]
adaboese commented 7 months ago

Based on my tests https://github.com/google/diff-match-patch is producing the best results when used with diff_cleanupSemantic. Looks I could continue using jsdiff if I just extract logic of diff_cleanupSemantic.

adaboese commented 7 months ago

Hm. That won't work as easy as I thought it would. It self-references the constructor and bunch of prototype functions. It would be nice if jsdiff supported equivalent logic as that package does not appear to be actively maintained.