kpdecker / jsdiff

A javascript text differencing implementation.
BSD 3-Clause "New" or "Revised" License
7.75k stars 491 forks source link

wordDiff.tokenize includes empty strings in its list of tokens #437

Closed ExplodingCabbage closed 4 months ago

ExplodingCabbage commented 7 months ago

This seems wrong:

> wd = require('./lib/diff/word')
{
  diffWords: [Function: diffWords],
  diffWordsWithSpace: [Function: diffWordsWithSpace],
  wordDiff: {
    equals: [Function (anonymous)],
    tokenize: [Function (anonymous)],
    options: { ignoreWhitespace: true }
  }
}
> wd.wordDiff.tokenize("(  foo  ),          bar")
[ '', '(', '', '  ', 'foo', '  ', '', ')', ',', '          ', 'bar' ]

I'm honestly not sure why this doesn't just completely break the main diffing algorithm when we use diffWords; we must have logic to filter out the ''s somewhere. But even so, it seems like it'd be nice if that happened at the .tokenize stage...