I've been trying to understand some of the implementation details of this, but I think there is some contextual knowledge that I like. I'll try to ask those questions here in a hope to find answers and contribute some improvements or at least code comments for future readers of this code.
I am confused by why ConsumableBuffer starts reading from the last byte as opposed to first byte
What is even more confusing that while it reads bytes from the right side it reads bits from left side & the way it keeps tracks of the _currentBitPos would make you think bits are also read from the right side, but as it turns out it's deceiving because it actually reverses start position just before reading bits
Perhaps most confusing about all this is the fact that js-ipfs-unixfs reverses hash bits, which makes me wonder if that would have been unnecessary if ConsumableBuffer just read bytes from left to right instead.
async function hamtHashFn (buf) {
return (await murmur3128.encode(buf))
// Murmur3 outputs 128 bit but, accidentally, IPFS Go's
// implementation only uses the first 64, so we must do the same
// for parity..
.slice(0, 8)
// Invert buffer because that's how Go impl does it
.reverse()
}
InfiniteHash seems to represent hash of some seed bytes, in a following pattern:
if I understand it's code correctly (default) hash will return 8 bytes, after which new hash is generated from seed bytes & depth producing another 8 bytes, etc...
I suspect after 255*8 bytes things will repeat (because Uint8Array.from([256]) would be [1]), not sure if that is intentional or a bug, but it is a scenario that will not occur in practice so probably does not matter.
Is this overall hashing behavior described somewhere ? I can't seem to find anything similar in go code base and in fact I got an impression that go will error with out of bound error.
Then again could it be that non of our tests run into conflicting keys and go and js might not align when that occurs ?
I'm under impression that _findPlace basically traverses trie by deriving index from 8 bits of the key hash, if bucket is encountered next 8 bits of the key hash is used to descend one level deeper etc... And since our hashes are deterministically derived from key + depth we will eventually encounter index without bucket in it
I've been trying to understand some of the implementation details of this, but I think there is some contextual knowledge that I like. I'll try to ask those questions here in a hope to find answers and contribute some improvements or at least code comments for future readers of this code.
I am confused by why
ConsumableBuffer
starts reading from the last byte as opposed to first bytehttps://github.com/ipfs/js-hamt-sharding/blob/44b6e6b20872200a16342a14592ed79b9660d996/src/consumable-buffer.ts#L28-L31
What is even more confusing that while it reads bytes from the right side it reads bits from left side & the way it keeps tracks of the
_currentBitPos
would make you think bits are also read from the right side, but as it turns out it's deceiving because it actually reverses start position just before reading bitshttps://github.com/ipfs/js-hamt-sharding/blob/44b6e6b20872200a16342a14592ed79b9660d996/src/consumable-buffer.ts#L47-L49
Perhaps most confusing about all this is the fact that js-ipfs-unixfs reverses hash bits, which makes me wonder if that would have been unnecessary if
ConsumableBuffer
just read bytes from left to right instead.https://github.com/ipfs/js-ipfs-unixfs/blob/a5b9cad0cddd391e7dd93904bc4d3366f8e79202/packages/ipfs-unixfs-importer/src/options.js#L8-L16
InfiniteHash
seems to represent hash of some seed bytes, in a following pattern:if I understand it's code correctly (default) hash will return 8 bytes, after which new hash is generated from seed bytes & depth producing another 8 bytes, etc...
https://github.com/ipfs/js-hamt-sharding/blob/44b6e6b20872200a16342a14592ed79b9660d996/src/consumable-hash.ts#L80-L89
I suspect after 255*8 bytes things will repeat (because Uint8Array.from([256]) would be [1]), not sure if that is intentional or a bug, but it is a scenario that will not occur in practice so probably does not matter.
Is this overall hashing behavior described somewhere ? I can't seem to find anything similar in go code base and in fact I got an impression that go will error with out of bound error.
Then again could it be that non of our tests run into conflicting keys and go and js might not align when that occurs ?
I'm under impression that
_findPlace
basically traverses trie by deriving index from 8 bits of the key hash, if bucket is encountered next 8 bits of the key hash is used to descend one level deeper etc... And since our hashes are deterministically derived from key + depth we will eventually encounter index without bucket in ithttps://github.com/ipfs/js-hamt-sharding/blob/44b6e6b20872200a16342a14592ed79b9660d996/src/bucket.ts#L154-L170
Is this accurate ? And if so, I feel like the whole hash cycling thing probably needs to be part of unixfs spec.