kpdecker / jsdiff

A javascript text differencing implementation.
BSD 3-Clause "New" or "Revised" License
7.75k stars 491 forks source link

diffWords doesn't consistently ignore whitespace, despite the README saying it should #436

Closed ExplodingCabbage closed 4 months ago

ExplodingCabbage commented 7 months ago

The docs:

Diff.diffWords(oldStr, newStr[, options]) - diffs two blocks of text, comparing word by word, ignoring whitespace.

Here are some examples of this working as described:

> diff.diffWords("foo bar", "foo          bar")
[ { value: 'foo          bar', count: 3 } ]
> diff.diffWords("foo, bar", "foo,          bar")
[ { value: 'foo,          bar', count: 4 } ]

But here are a couple of examples of it not working as described:

> diff.diffWords("foo bar", "foo\n\nbar")
[
  { count: 2, value: 'foo\n' },
  { count: 1, added: true, removed: undefined, value: '\n' },
  { count: 1, value: 'bar' }
]
> diff.diffWords("( foo )", "(foo)")
[
  { count: 1, value: '(' },
  { count: 1, added: undefined, removed: true, value: ' ' },
  { count: 1, value: 'foo' },
  { count: 1, added: undefined, removed: true, value: ' ' },
  { count: 1, value: ')' }
]

The first failure case is due to changes from https://github.com/kpdecker/jsdiff/pull/217, but the second is independent of those changes.

ExplodingCabbage commented 7 months ago

An even more profound example of whitespace being EXTREMELY SIGNIFICANT to diffWords, in a way that produces frankly absurd results:

https://github.com/kpdecker/jsdiff/issues/160#issuecomment-1866099640