onmyway133 / DeepDiff

🦀Amazingly incredible extraordinary lightning fast diffing in Swift
https://onmyway133.com/apps/
Other
2.05k stars 145 forks source link

Using HashValue as table key is unreliable #13

Closed WCByrne closed 6 years ago

WCByrne commented 6 years ago

I've been testing a variation of your Heckel implementation in a macOS CollectionView project and found an issue. Using hashValue as the key in the table is not reliable. I've made this same mistake myself in early versions of the data structures used in that collection view implementation. My basic solution for now is to replace with [T.Element:TableEntry] and the issue hasn't come up again.

From the Apple Docs

A hash value, provided by a type’s hashValue property, is an integer that is the same for any two instances that compare equally. That is, for two instances a and b of the same type, if a == b, then a.hashValue == b.hashValue. The reverse is not true: Two instances with equal hash values are not necessarily equal to each other.

onmyway133 commented 6 years ago

@WCByrne Hi, thanks for the suggestion. I will check if I can make the changes. Besides, do you have a test that fails because of this?

WCByrne commented 6 years ago

I do not. I was running into this issue while running a demo app (in the Collection View project) which uses Core Data. When I tried to reproduce the issue in a test using mock objects (not NSManagedObject) but making the same changes to the arrays, it succeeded. I finally noticed that the edits produced from the diff were different given the same source and target arrays. I'm not sure what NSManagedObject uses for it's hashValue but it is apparently less unique that you might expect.

As I mentioned, I have made the same assumption and used hashValue for keys in some of my data structures and often ran into this unexpected behavior.

andrebraga commented 6 years ago

It's more likely that the usage of NSManagedObjects hashes is unreliable because if I remember correctly they're derived from the objectID. If you obtain permanent IDs prior to diffing then it's likely that the algorithm is sound.

Otherwise a weak to weak map table might work. Haven't really looked into the implementation.

onmyway133 commented 6 years ago

@WCByrne Hi, sorry for late response. I've tried using the whole Hashable object as the key, but I don't see if it makes any difference. Dictionary will use hashValue of Hashable object as the key, so it performs the same as when we use just hashValue. I will change if there's any issues/falling tests

WCByrne commented 6 years ago

👍 hashValue is unique most of the time and will probably work for most applications but is not "correct" as explained by the description of hash value from the docs quoted above.

ktraunmueller commented 6 years ago

Hashable is not sufficient as a type constraint on elements in order to compute a diff between two collections. Two different objects with different properties may have the same hash value, as pointed out above.

In order to compute insertions, deletions, and moves, you will need at least an Equatable type constraint on the element type.

And if you want to detect (in-place) updates, you will also need some concept of identity. Maybe an "Identifiable" protocol, that uses some form of id to answer the question "are these two objects the same individual? (possibly with different properties)".

onmyway133 commented 6 years ago

@ktraunmueller Hi, you're right. Maybe this issue may interest you https://github.com/onmyway133/DeepDiff/issues/16

ktraunmueller commented 6 years ago

Thanks for the quick response. So, DeepDiff does not generally give correct results, because the requirements imposed on the element type are too weak, right?

onmyway133 commented 6 years ago

@ktraunmueller It is based on Hashable and Equatable, so I think it works quite good. Do you have a test that fails?

ktraunmueller commented 6 years ago

Sorry, I forgot that a Hashable is an Equatable... 🤦‍♂️

onmyway133 commented 6 years ago

@ktraunmueller No worried, I learned that the hard way too, you can read my story here https://github.com/onmyway133/blog/issues/122 😅

ahti commented 6 years ago

I've tried using the whole Hashable object as the key, but I don't see if it makes any difference. Dictionary will use hashValue of Hashable object as the key, so it performs the same as when we use just hashValue. I will change if there's any issues/falling tests

This is incorrect. You can verify easily by throwing this code in a playground:

struct Lol: Hashable {
    let str: String

    var hashValue: Int {
        return 0
    }
}

var d: [Lol: String] = [:]

d[Lol(str: "hello")] = "asdf"
d[Lol(str: "bye")] = "fdsa"

assert(d.count == 2)

Equal hash values do not mean equal objects, and DeepDiff should work with diffs containing things with colliding hash values, which from what I read here (haven't looked at the code itself) is not the case right now.