usethesource / capsule

The Capsule Hash Trie Collections Library
BSD 2-Clause "Simplified" License
404 stars 27 forks source link

Question about `dataMap` for singleton node #13

Closed 97jaz closed 7 years ago

97jaz commented 7 years ago

Is this logic for selecting the newDataMap for a new singleton node correct?

Specifically, in the case where shift is not 0, newDataMap is set to bitpos(mask(keyHash, 0)). But keyHash is the hash of the key that's being removed, right? (And the data map should reflect the key that remains.)

Or am I misunderstanding what's going on in that code?

msteindorfer commented 7 years ago

The logic you referred to is correct; it's an optimization that avoids recalculating a fraction of the hash-code, because its already known. I answered the same question in issue #9 as well:

The lines you mentioned create a (temporary) node that either gets unboxed when returning from recursion, or it becomes the new root node. I'm passing 0 as shift to the updated method, thus I'm only interested in the first 5 bits of the hash code. Because all elements in that node share the same common prefix, it is not necessary recalculate the hash code, I can simply take keyHash.

Here's a visualization, if we assume that we found the element to be removed at shift 15 :

... - mask(keyHashOfRemoved,   15)
                                  \ 
                                   -  ... - mask(_, 10) - mask(_, 5) - mask(_, 0)
                                  / 
... - mask(keyHashOfRemaining, 15)

You can use keyHashOfRemoved or keyHashOfRemaining for substituting _, because all mask values for shifts lower than 15 (and thus also for the case of shift 0) equal.

In case you're going to reimplement the logic, you could either explicitly recover keyHashOfRemaining to make the intent clear (at some performance cost), or properly document this optimization in the code (something I should do as well).

PaulKlint commented 7 years ago

Maybe add a comment -- like the above text -- in the code to clarify this?

97jaz commented 7 years ago

Because all elements in that node share the same common prefix [...]

Ah, of course. Thank you, @msteindorfer -- I should have realized that.