Open sjakobi opened 2 years ago
@treeowl Does my explanation above make sense to you, or do you have a different understanding?
I wonder whether "sub-hash" might be a better name, since the term "key" usually refers to a k
type thing instead of its hash.
Simply 'Index' might be another good name.
It would be good to have a proper explanation of how a HashMap
tree can be navigated by following the "index path" represented by a hash. The invariant on correctly located leaf nodes could then reference this explanation.
I think your explanation is good. I don't know which word to use. index
might be simplest in context.
type Index = Word
indexPath :: Hash -> [Index]
indexPath = go 0
where
go s !h | s >= finiteBitSize @Word 0 = []
| otherwise = index h s : go (nextShift s) h
The problem with index
is that it might be confused with an array index.
The problem with
index
is that it might be confused with an array index.
That's not entirely wrong though. The existing index
function does compute the array index for the array in a Full
node:
Right, but the index is not the subkey.
Right, but the index is not the subkey.
Yes, it is?! It's just two names for the same concept: A 5-bit chunk of a Hash
that can be used to index into a Bitmap
or a Full
node's array.
You have to combine the the subkey with the mask to get the array index, don't you?
You have to combine the the subkey with the mask to get the array index, don't you?
I think you're referring to BitmapIndexed
nodes now, where the array indices are computed with sparseIndex
using the Bitmap stored in the node.
For a Full
node, all you do is right-shift the hash by the Shift
corresponding to the desired level and use the subkeyMask
to get the lowest 5 bits. Then you have the index into the array.
I'll try to document all this stuff properly and let you review.
Yeah, that's what I was talking about for sure. I think the subkey thing makes sense as a unifying principle there. But of course getting rid of Full
would be better. I doubt I'll have time to work on that in the next few weeks though.
Even if we get rid of Full
, this sub-key / index concept will remain essential to understand the HashMap
internals.
Yes. I feel like we're talking past each other somehow? I had the impression you wanted to rename subkey to index. Was that a bogus misinterpretation on my part?
I had the impression you wanted to rename subkey to index. Was that a bogus misinterpretation on my part?
No. I think this renaming (e.g. bitsPerSubKey
-> bitsPerIndex
) makes sense, as long as its sufficiently clear that an Index
can not be used directly for the array stored in a BitmapIndexed
node.
If making this distinction seems too hard, I'd settle for "sub-hash", e.g. bitsPerSubHash
. In that case we should probably also rename the existing index
function to subHash
.
I think subhash is clearer than index. I don't personally have trouble with subkey either. My main confusion is over which direction the bits go in a BitmapIndexed
, since the bitwise operations don't have any examples or description of what they actually do to the bits.
My main confusion is over which direction the bits go in a
BitmapIndexed
, since the bitwise operations don't have any examples or description of what they actually do to the bits.
I did add some examples in #396. Feel free to add more or to point out what's missing!
There are two constants named using this term, but I can't find any other use of it in Bagwell's paper or the context of HAMTs.
https://github.com/haskell-unordered-containers/unordered-containers/blob/4da2c201a274f7f3306117b479c51aef7af61129/Data/HashMap/Internal.hs#L2361-L2362
https://github.com/haskell-unordered-containers/unordered-containers/blob/4da2c201a274f7f3306117b479c51aef7af61129/Data/HashMap/Internal.hs#L2367-L2368
From this context I suspect that a subkey is a 5-bit part of a hash. I.e. a 64-bit or 32-bit hash can be understood as a sequence of subkeys. For example:
Subsequent subkeys of a given hash are used to navigate the tree at subsequent levels: