riak-ripple / ripple

A rich Ruby modeling layer for Riak, Basho's distributed database
Other
618 stars 152 forks source link

Quick fix on 1.8.7 for Hash#hash #283

Closed myabc closed 12 years ago

myabc commented 12 years ago

Dug into this issue – on 1.8.7 (and not on 1.9.3), the following two Hashes compute to the same #hash code:

{"street"=>"123 Somewhere St", "notes"=>[{"text"=>"Bob lives here"}], "kind"=>nil, :notes=>[{"text"=>"Bob lives here"}]}
=> 821842552 

{"street"=>"123 Somewhere St", "notes"=>[{"text"=>"Jill lives here"}], "kind"=>nil, :notes=>[{"text"=>"Jill lives here"}]}
=> 821842552
seancribbs commented 12 years ago

@myabc Email me your address and t-shirt size (unless you want a Basho mug instead of a shirt).

dkubb commented 12 years ago

What about self.class.hash ^ parent.hash ^ serializable_hash.hash? The typical pattern I use in other libs is to xor the hash of the exact objects involved in the #eql? test.

A bit nicer #hash implementation that doesn't create unnecessary Array objects might be:

def hash
  hash  = self.class.hash ^ _parent_document.class.hash ^ serializable_hash.hash
  hash ^= _parent_document.key if _parent_document.respond_to?(:key)
  hash
end
seancribbs commented 12 years ago

@dkubb Interesting, why hash the class though? Also, I don't see how involving the parent document is necessary woops the existing impl does

dkubb commented 12 years ago

@seancribbs The reason we involved self.class in the mix is that superclass and descendant instances that have otherwise identical internal state will hash to the same value. Technically that shouldn't matter as long as superclass_instance.eql?(descendant_instance) doesn't return true, but I prefer to remove the ambiguity and have them return different hash values.

Yeah, and I have no idea why the parent document is involved in hashing, I was mostly just using the existing #hash method as a reference. I didn't look, but I presumed that #eql? used it when comparing. In general you want to have #eql? and #hash using the same objects. The #eql? method can do more work with more objects, but usually I like to keep them symmetrical for simplicity and consistency.

seancribbs commented 12 years ago

@dkubb Thanks for the clarification. I like the lightweight way yours works, that said, it doesn't resolve the hashing bug on 1.8. I think we should combine approaches.

seancribbs commented 12 years ago

By the way, both of you will get shirts/mugs/stickers. Just email me your full name, address to sean@basho.com.

seancribbs commented 12 years ago

Thanks for the contributions, guys. I'm going with a hybrid of the two approaches, which seems to pass the tests.