sergi / go-diff

Diff, match and patch text in Go
MIT License
1.81k stars 207 forks source link

diffmatchpatch: invalid diff output with CRLF line endings #115

Open stamblerre opened 3 years ago

stamblerre commented 3 years ago

I believe the fix for https://github.com/sergi/go-diff/issues/89 introduced regressions for files with \r\n line endings. This reproduces as of commit https://github.com/sergi/go-diff/commit/db1b095f5e7c905e196ff6bfd56189a41aa76309.

The following repro program illustrates the issue. It compares two strings, whose only differences are line endings. (This was caught by a gopls test in https://golang.org/cl/275439).

package main

import (
    "fmt"
    "strings"

    "github.com/sergi/go-diff/diffmatchpatch"
)

func main() {
    const before = `package main

import (
    "fmt"
)

/*
func upload(c echo.Context) error {
    if err := r.ParseForm(); err != nil {
        fmt.Fprintf(w, "ParseForm() err: %v", err)
        return
    }
    fmt.Fprintf(w, "POST request successful")
    path_ver := r.FormValue("path_ver")
    ukclin_ver := r.FormValue("ukclin_ver")

    fmt.Fprintf(w, "Name = %s\n", path_ver)
    fmt.Fprintf(w, "Address = %s\n", ukclin_ver)
}
*/
`
    after := strings.ReplaceAll(before, "\n", "\r\n")
    diffs := diffmatchpatch.New().DiffMain(before, after, true)
    fmt.Println(diffmatchpatch.New().DiffPrettyText(diffs))
}

Before https://github.com/sergi/go-diff/commit/db1b095f5e7c905e196ff6bfd56189a41aa76309, the diffs produced are correct. After, they are:

package main
                fmt.Fprintf(w, "ParseForm() err: %v", err)

                fmt.Fprintf(w, "ParseForm() err: %v", err)
import (
import (
)

/*
import (
        "fmt"
import (
)
import (
/*
import (
func upload(c echo.Context) error {
import (
        if err := r.ParseForm(); err != nil {
import (
                fmt.Fprintf(w, "ParseForm() err: %v", err)
import (
                return
        "fmt"

                fmt.Fprintf(w, "ParseForm() err: %v", err)
        "fmt"

        "fmt"
import (
        "fmt"
        "fmt"

)
        "fmt.Fprintf(w, "Address = %s\n", ukclin_ver
)
}
*/

which is clearly invalid.

findleyr commented 3 years ago

We've just discovered (in https://golang.org/issue/45648) that this probably affects code without CRLF line endings as well. We suspect that this is the cause of an uptick of user-reported formatting problems in gopls, starting around December, due to users installing gopls with go get -u.

Thank you @ShoshinNikita for working on a fix. Is there anything we can do to help get that reviewed and released?

stamblerre commented 3 years ago

I can confirm that this patch fixes the repro case provided in #115.

findleyr commented 3 years ago

Thanks Rebecca.

Rebecca and I will be pairing on reviewing the change itself on Monday.

mattysweeps commented 3 years ago

What's the status of this?

vsoch commented 2 years ago

I just ran into this too - about an hour of debugging because I thought I was crazy - I would write "the same" strings into a test program and they would be fine, but those from an environment were not. This must not be fixed because I used latest, so for others that need a quick solution what worked for me is to just strip any kind of newline:

import strings

// Strip any kind of newline from the string
func Strip(line string) string {
    return strings.NewReplacer(
        "\r\n", "",
        "\r", "",
        "\n", "",
        "\v", "",
        "\f", "",
        "\u0085", "",
        "\u2028", "",
        "\u2029", "",
    ).Replace(line)
}

And basically parse the strings with that before providing to the library here.