yudai / gojsondiff

Go JSON Diff
Other
535 stars 81 forks source link

confused Similarity func when compare two array #38

Open djangocc opened 4 years ago

djangocc commented 4 years ago

I know it's hard to locate two exactly index which supposed to be compared in an array and i can understand that the similarity() is a way to make the similarity of two nodes quantifiable in the json, but I don't think your code way to compare two json objects is feasible. In your code, the complement of the Object delta similarity function is to divide the sum of all the sub deltas similarity by the count of the sum deltas, the formula is like sum(sub deltas similarity)/count(sub deltas). but when compare two objects, the field which is exactly the same will not generate the delta object. so in some way, it's not a fair similarity way. I think the fair similarity formula is like (sum(sub deltas similarity)+count(same fields))/(count(sub deltas)+count(same fields)) if you can agree with me, I will try to send the pr. here is some example: left json: [{"a":1,"b":2,"c":3}] right json:[{"a":2,"b":2,"c":4},{"a":1,"b":2,"c":3,"d":4}]

and the delta result is like { "placeholder": [ 0: { - "a": 1, + "a": 2, - "b": 2, + "b": 3, - "c": 3 + "c": 4 } + 0: { + "a": 2, + "b": 3, + "c": 4 + } ] }

there are two bugs in the above example I think. one is the first object {"a":2,"b":2,"c":4} in the right json appear twice in the deltas, which means that object has been compared twice, it may be a bug. bug code is on below

var x, y int
for x, y = 0, 0; x <= sizeX-2 && y <= sizeY-2; {
    current := dpTable[x][y]
    nextX := dpTable[x+1][y]
    nextY := dpTable[x][y+1]
    xValidLength := len(left) - maxInvalidLength + y
    yValidLength := len(right) - maxInvalidLength + x
    if x+1 < xValidLength && current == nextX {
        freeLeft = append(freeLeft, left[x])
        x++
    } else if y+1 < yValidLength && current == nextY {
        freeRight = append(freeRight, right[y])
        y++
    } else {
        resultDeltas = append(resultDeltas, deltaTable[x][y])
        x++
        y++
    }
}
for ; x < sizeX-1; x++ {
    freeLeft = append(freeLeft, left[x-1]) //here 
}
for ; y < sizeY-1; y++ {
    freeRight = append(freeRight, right[y-1]) //here
}

the x-1 or y-1 index has been appended to the result Deltas, so they are not supposed to be appended into the deleted ary or added ary again , so they should be fixed to freeLeft = append(freeLeft, left[x]) and when i fixed the first bug, the result becomes to { "placeholder": [ 0: { - "a": 1, + "a": 2, - "b": 2, + "b": 3, - "c": 3 + "c": 4 } + 1: { + "a": 1, + "b": 2, + "c": 3, + "d": 4 + } ] } in this case, I think the first element in the left json array should be compared to the second element in the right json array, cause they're much similar. the reason is the similarity() way may be not that fair.