Open haimgel opened 1 year ago
I think this is connected to how abao can verify slices of data. https://github.com/n0-computer/abao
But https://github.com/lukechampine/blake3/issues/17#issuecomment-1548498871
if I have Blake3 digests of parts of a file, I should be able to calculate the digest of the whole
kinda. A BLAKE3 digest is the root of a Merkle tree, but the root hash and the interior node hashes use different flags. (This was an intentional design decision.) So if you have two files A and B, there's no way to get BLAKE3(A || B)
from BLAKE3(A)
and BLAKE3(B)
.
As pcfreak said, you probably want to use Bao for this. The Bao encoding contains the interior node hashes, which lets you reconstruct the Merkle root.
@lukechampine, thank you for your response!
Unfortunately, I cannot use Bao since Bao requires scanning the whole file in sequence, and I need to avoid it (the parts could be processed out of sequence, on different machines, etc.) I can, however, get some other aspect of an internal state of Blake3 for each part separately. I wonder if the complete blake3 digest can be constructed from just the chaining values of each part?
Ah. Well, as long as each part is aligned to 1024-byte boundaries of the original file, it should be possible to compute the chaining value(s) of each part and combine them into the total root. This package doesn't provide a direct way to do that, but you could achieve this by calling BaoEncode
on each part and manually extracting the relevant chaining values. (This will be somewhat wasteful though, as the Bao file will contain many more chaining values than the ones you actually need.)
If I were you, I think I would fork this package and add a Stack
method that returns the internal stack
field. Then write a function that takes a set of stacks and computes their combined root. That way, you can calculate the chaining values for each part independently, and merge their stacks together at the end.
To make your life a lot easier, ensure that the number of chunks in each part is a power of 2. That way, each part will only have a single chaining value, and you can merge them using the algorithm in mergeSubtreesGeneric
.
@lukechampine, thank you, this is very helpful! I'll try these suggestions.
@lukechampine, I managed to get it to work, I also had to make the "counter" work right for parts other than the first one.
Would you be interested in a PR that implements a wrapper around Hasher that minimally exposes some internal state and functionality of the Hasher, similar to what the Rust crate has in their "guts" module? They do it for a very similar purpose.
no promises, but if you open a PR I'll definitely take a look!
@haimgel fyi, I've added a guts
package now :)
Thank you so much for this!
From my reading of Blake3 paper, and my understanding of Merkle trees in general, I think this should be doable in theory: if I have Blake3 digests of parts of a file, I should be able to calculate the digest of the whole (by building the tree of digests and then getting the root). But I don't think this module's API exposes enough to make it work, or am I missing something?