mattbierner / hamt

Javascript Hash Array Mapped Trie
MIT License
251 stars 16 forks source link

Collision being stored in wrong node (value set() incorrectly get())? #3

Closed raymond-w-ko closed 10 years ago

raymond-w-ko commented 10 years ago

Hello, first thank you very much for providing this easy to understand implementation of HAMT in Javascript. I am trying to port your implementation for Lua and this is a very good reference.

However, I am experiencing a bug while coding some correctness tests. At first I though it was a porting/translation error, however I ran the same data set with your hamt.js with node.js and discovered the exact same error with the same key as my Lua test. Basically, I immediately set() a key value and when I try to get() it back I get a null.

You can get my test data and replicate this by cloning my repo at: https://github.com/raymond-w-ko/hamt.lua/tree/master/references and running ./node test.js

Curiously, your hamt_plus implementation does not have this error, and did notice some backport fixes. So maybe some change is missing?

raymond-w-ko commented 10 years ago

I think the problem occurs when Collision.prototype.modify() is called with a key which hashes to a different value than that Collision's hash value.

Like I put the following at beginning of Collision.prototype.modify(), but this is thrown when the problematic key is added ("27815778").

if (h !== this.hash) throw "tried to add a key whose hash is different than the Collision's hash.";

From my own data dumps below, the Collision node does have valid value, but this one should not be added to this Collision node.

Curiously the lower bits of the hashes are nearly the same:

29117953 ==  1BC4E01
164384257 == 9CC4E01
key: 85035710 -> hash: 29117953
key: 27815778 -> hash: 164384257
key: 14756006 -> hash: 29117953
--------------------
hash NOT matches
Collision:lookup() hash:164384257 key:27815778
__unnamed__ = {
   ["children"] = {
      [1] = {
         ["hash"] = 29117953;
         ["key"] = "14756006";
      };
      [2] = {
         ["hash"] = 29117953;
         ["key"] = "85035710";
      };
   };
   ["hash"] = 29117953;
};

arg hash: 164384257
key: 27815778
internal hash: 29117953
mattbierner commented 10 years ago

Glad that you found the library useful and thanks for reporting this, but damn is that a big test data file!

This seems to reproduce the bug as well:

var h = hamt.empty;
h = hamt.setHash(0, 'a', 1, h);
h = hamt.setHash(0, 'b', 2, h);
h = hamt.setHash(1, 'c', 3, h);

console.log(hamt.getHash(0, 'a', h));
console.log(hamt.getHash(0, 'b', h));
console.log(hamt.getHash(1, 'c', h));

outputs:

1
2
null

I am investigating further.

I believe the root cause is that the code assumes collision nodes would always exist at conceptual leaves of the hash tree. But, as an optimization, branches on the real tree are expand only when needed. This results in collision nodes being stored throughout all levels of the conceptual hash tree, but always in the leaves of the real tree. When a node is inserted along this unexpanded branch, it ends up being put into the collision list.

Instead, we need to insert a new internal branch node and set the collision node and inserted node as children of the branch (similar to what mergeLeaves does for Leaf nodes).

mattbierner commented 10 years ago

Seems my modify tests were also broken. (tests/modify.js collsion) They should have caught this issue but were actually checking for the incorrect behavior.

mattbierner commented 10 years ago

Tested against your data again and everything seems to be passing now.

See the change log for hamt.kep for the fix complete. It was as I suspected. Collision nodes were not expanded correctly when inserting into a section further down the unexpanded branch. I don't know why hamt_plus did not have this issue too, but I will investigate further there.

Thanks again for reporting this.