kpdecker / jsdiff

A javascript text differencing implementation.
BSD 3-Clause "New" or "Revised" License
7.75k stars 491 forks source link

Stop treating stuff like vertical tabs as line breaks when dealing with unified diffs #435

Closed ExplodingCabbage closed 7 months ago

ExplodingCabbage commented 7 months ago

Tools like the Unix diff and patch CLI tools treat \v, \f, etc as just ordinary characters, but jsdiff currently treats them as line breaks. That creates an unfortunate incompatibility: if you generate a unified diff format diff with diff -u foo bar, and some of the lines in files foo and bar contained a \v or whatever, then jsdiff will parse the patch wrongly.

To see the problem, checkout the first commit on this PR (where I've added the test but not yet fixed jsdiff's behaviour to make the test pass) and run yarn test. You'll get this failure:

      1) should treat vertical tabs like ordinary characters

  179 passing (137ms)
  1 failing

  1) patch/parse
       #parse
         should treat vertical tabs like ordinary characters:

      AssertionError: expected [ Array(1) ] to deeply equal [ Array(1) ]
      + expected - actual

           "hunks": [
             {
               "linedelimiters": [
                 "\n"
      -          "\u000b"
      +          "\n"
      +          "\n"
      +          "\n"
      +          "\n"
      +          "\n"
               ]
               "lines": [
                 " foo"
      -          "-bar"
      +          "-bar\u000bbar"
      +          "+barry\u000bbarry"
      +          " baz"
      +          " qux"
      +          "\\ No newline at end of file"
               ]
               "newLines": 4
               "newStart": 1
               "oldLines": 4

      at Context.<anonymous> (test/patch/parse.js:647:19)
      at process.processImmediate (node:internal/timers:478:21)

Notice that it's not just the line delimiters that are wrong, but that the actual array of lines in the hunk is too short as a result of the parser choking on the \v character - all the lines after the \v have simply disappeared from our hunk!