Closed achingbrain closed 1 year ago
Yeah, it shouldn't be "bytes". This is a typical problem with HAMT descriptions, we're dealing with different units and the UnixFS one makes it even harder by hex stringifying and then depending on the prefix length of the hex string!
bits
is at the base, the number of bits in the hash to chew off at each level of the HAMT, could be less than a byte but is 8
by default in the UnixFS HAMTfanout
as encoded in the UnixFS data field is the arity of the HAMT, the maximum number of children for each node, calculated from bits
as Math.pow(2, bits)
, the number of different values that numbers represented bits
bits can give you.bytes
is trickier to fit in here, but it would be something like Math.ceil(bits / 8)
, but arguably not very meaningful and probably too confusing.prefixLength
, or padLength
, etc. is more helpful for UnixFS HAMTs and is the number of characters to fit in a hex representation of the prefix bytes. Either like it's done in here: (fanout - 1).toString(16).length
(where fanout
is bucket.tableSize()
), or Math.ceil(Math.log2(fanout) / 8 * 2)
, or Math.ceil(bits / 8 * 2)
.Made my suggested changes in #357 - going with "bits" - the nice thing about using bits is that you don't have to worry about divisibility. You can't really do a fanout that's not a power of 2, so start low and power up from the bit count.
It was bytes
because the shard split threshold option next to it was bytes
so it was a (flawed) attempt to keep the units consistent, but quite right bits
is more correct. Thanks all.
Adds a
shardFanoutBytes
option to the importer to allow configuring the number of bytes used for the HAMT prefix, also a test.