usethesource / capsule

The Capsule Hash Trie Collections Library
BSD 2-Clause "Simplified" License
404 stars 27 forks source link

HashCode Accident #9

Closed norswap closed 8 years ago

norswap commented 8 years ago

I think the second line here should say result = prime * result + (dataMap())

https://github.com/usethesource/capsule/blob/master/src/main/java/io/usethesource/capsule/TrieSet_5Bits.java#L1273-L1274

Cheers!

msteindorfer commented 8 years ago

Thanks Nicolas for pointing that out. I'll fix this oversight shortly.

It's indeed an oversight that could cause more collisions, but does not influence correctness. That said, the hashCode of the nodes themselves is currently not used, only hashCode on the collections themselves (invoking different code).

norswap commented 8 years ago

Don't worry about it, I just thought I'd report it since I'd spotted it :)

By the way, is the rascal code generating the library available somewhere?

norswap commented 8 years ago

Found another one:

https://github.com/usethesource/capsule/blob/master/src/main/java/io/usethesource/capsule/TrieMap_5Bits.java#L1805-L1806

Pretty sure it should be theOtherKey.hashCode() instead of keyHash on that line.

msteindorfer commented 8 years ago

Thanks for digging into implementation details. However, in this case you're not right.

The lines you mentioned create a (temporary) node that either gets unboxed when returning from recursion, or it becomes the new root node. I'm passing 0 as shift to the updated method, thus I'm only interested in the first 5 bits of the hash code. Because all elements in that node share the same common prefix, it is not necessary recalculate the hash code, I can simply take keyHash.

In fact, because in this instance it's a hash collision node, it could even use the instance variable hash that memoizes the full 32-bit hash code for all elements in this node. Remember, hash collision => all 32-bit of the hash code (prefix) are the same for all collision elements.

Anyways, thanks for reporting, I appreciate that.

norswap commented 8 years ago

You're right, dumb mistake on my part!

msteindorfer commented 8 years ago

It's not a dumb mistake, it rather shows it should document what's going on. In fact it is an optimization avoiding the recalculation of the hash code.