yudai / gojsondiff

Go JSON Diff
Other
536 stars 79 forks source link

Fix array diffs #45

Open GGabriele opened 2 years ago

GGabriele commented 2 years ago

As highlighted in various issues (https://github.com/yudai/gojsondiff/issues/24, https://github.com/yudai/gojsondiff/issues/30), the library is currently failing at computing correct and reliable diffs for arrays, and more specifically whenever the size of the "left" array is less than the "right" one:

case 1: len(left) < len(right) (bogus diffs)

$ cat before.json
{
   "array": [
      "x-posting-ip"
   ]
 }

$ cat after.json
{
   "array": [
      "accept-encoding",
      "x-forwarded-for"
   ]
 }

$ go run main.go before.json after.json
 {
   "array": [
-    0: "x-posting-ip"
+    0: "accept-encoding"
+    0: "accept-encoding"
   ]
 }

case 1 (continuation): len(left) < len(right) (unreliable diffs)

$ cat before.json
{
   "array": [
      "blabla"
   ]
 }

$ cat after.json
{
   "array": [
      "accept-encoding",
      "x-forwarded-for"
   ]
 }

$ go run main.go before.json after.json
 {
   "array": [
+    0: "accept-encoding"
   ]
 }

case 2: len(left) == len(right) (OK)

$ cat before.json
{
   "array": [
      "blabla",
      "blabla"
   ]
 }

$ go run main.go before.json after.json
 {
   "array": [
-    0: "blabla",
+    0: "accept-encoding",
-    1: "blabla"
+    1: "x-forwarded-for"
   ]
 }

case 3: len(left) > len(right) (diffs are OK, but there is a lingering duplicate element at the bottom of the array)

$ cat before.json
{
   "array": [
      "blabla",
      "blabla",
      "blabla"
   ]
 }

$ go run main.go before.json after.json
 {
   "array": [
-    0: "blabla",
+    0: "accept-encoding",
-    0: "blabla",
-    1: "blabla"
+    1: "x-forwarded-for"
     2: "blabla"                   // shouldn't be here
   ]
 }

Both issues highlighted in case 1 are the most serious ones, while the one in case 3 is mostly visually confusing.

I believe the cause of these issues are two:

  1. when you loop through the maybe matrixes here, you are missing the last element of each row/column
  2. the way the similarity and its score is calculated for strings is not reliable

What I'm proposing here is a simple fix for the (1) and to use strutil to calculate the similarity for (2).

$ cat before.json
{
   "array": [
      "x-posting-ip"
   ]
 }

$ cat after.json
{
   "array": [
      "accept-encoding",
      "x-forwarded-for"
   ]
 }

$ go run main.go before.json after.json
 {
   "array": [
-    0: "x-posting-ip"
+    0: "accept-encoding"
+    1: "x-forwarded-for"
   ]
 }
$ cat before.json
{
   "array": [
      "blabla"
   ]
 }

$ go run main.go before.json after.json
 {
   "array": [
-    0: "blabla"
+    0: "accept-encoding"
+    1: "x-forwarded-for"
   ]
 }

Changes don't break current test cases:

$ go test -race ./...
# gojsondiff/jd
jd/main.go:42:4: Println arg list ends with redundant newline
# gojsondiff/jp
jp/main.go:24:4: Println arg list ends with redundant newline
ok      gojsondiff  1.919s
ok      gojsondiff/formatter    1.104s

The code is quite complex, so I may be missing some obvious things, so please let me know if that's the case!

I'm also introducing go.mod in the same PR, which is resulting in a giant bulk of changes. The actual code changes are very minimal and all under the gojsondiff.go file.