NewFromDerivedKey vs NewDeriveKey

Well done getting this up so quickly! Any feedback on how the reference implementation could be improved? Any parts that were confusing? Did you happen to find Section 5.1.2 in the spec that talks about what's going on with add_chunk_chaining_value? :)

Some thoughts about this comment:

NewFromDerivedKey returns a Hasher whose key was derived from the supplied context string.

That's true, but it's also kind of an implementation detail. The intention isn't really for users to know about or think about this "context key". Instead, the word "key" in the name derive_key/new_derive_key refers to the output of the function. The intention is that the user starts with a context string and some key material, and the output is a context-specific derived subkey.

In that sense, it might make more sense to use the name NewDeriveKey or something like that? What do you think?

I must confess that I ported the code before I finished reading the spec. 😅 The concepts are mostly familiar to me, though; I implemented a similar "stack based" Merkle tree hasher here. (I think my nodeHash method there is doing the same thing as add_chunk_chaining_value.)

I didn't have any major difficulties porting, except for NewFromDerivedKey, coincidentally. My implementation was correct, but my test was using the wrong context string: I used BLAKE3 2019-12-27 16:29:52 example context, from the optimized Rust tests, instead of BLAKE3 2019-12-27 16:29:52 test vectors context from the C tests (the reference implementation folder does not contain tests). The context string should definitely be made explicit in the test vectors JSON object, like the key is.

I see what you mean about the name. I didn't really grok derive_key until just now; I understood that it was hashing a string to produce a keyed hash, but I didn't understand that the keyed hash should only be used to derive further keys. Given the potential for confusion, I almost want to avoid exposing the "intermediate" Hasher at all. Something like:

func DeriveKey(derivedKey []byte, ctx string, srcKey []byte) {
    h := /* ...construct Hasher using ctx... */
    h.Write(srcKey)
    h.XOF().Read(derivedKey)
}

The only problem I see with this is that you can't write an unbounded stream of data to the intermediate Hasher; srcKey has to fit in memory. But that doesn't seem like a particularly severe restriction. Unless I misunderstand the use case, srcKey should never need to be larger than 32 bytes or so.

Another API question I've been thinking about: what do you think of allowing custom flags? There are 32 flag bits, but only 7 are used. The reason I ask is that I'd like to set "leaf" and "interior node" flag bits when hashing up a Merkle tree, in order to prevent collisions without having to prepend a byte to the input. (Alternatively: if I understand correctly, a BLAKE3 hash is a Merkle root, so I could just use that; but the leaves of the BLAKE3 hash are the wrong size, so I'd need to parameterize that somehow.)

btw, I believe we already doing "verified streaming" in Sia, albeit much less efficiently: we request random-access sections of a file from untrusted "hosts", who provide the data along with a Merkle range proof of its integrity. Isn't it so fun how you can use bit-twiddling hacks for this stuff? :)

On that note, should BLAKE3 implementations provide proof-related functionality? It would impose extra burden on maintainers, but the alternative is that application developers would have to reimplement most of the BLAKE3 guts in order to hash things up properly, which doesn't seem right.

I think my nodeHash method there is doing the same thing as add_chunk_chaining_value.

Woah it sure is! That's...kind of an amazing coincidence. It definitely wasn't the first version of the algorithm I tried.

I used "BLAKE3 2019-12-27 16:29:52 example context", from the optimized Rust tests

That's a good point. I've changed the unit test string to "BLAKE3 2019-12-27 16:13:59 example context (not the test vector one)", so hopefully no one else has the same problem.

I didn't really grok derive_key until just now; I understood that it was hashing a string to produce a keyed hash, but I didn't understand that the keyed hash should only be used to derive further keys.

I wouldn't go so far as to make that a hard rule. There's nothing harmful per se about using to derive_key to hash something that's not a secret key, though of course the output won't be secret either in that case. That said, I can't really think of a good use case. (What I do worry about with derive_key, is people upholding the requirement that the context string should be globally unique, application-specific, and hardcoded.)

Unless I misunderstand the use case, srcKey should never need to be larger than 32 bytes or so.

That's true in the majority of cases, for sure. Though sometimes KDFs are used to distill a larger input with uneven entropy (maybe a really predictable prefix or suffix, but plenty of random bytes in the middle) into a uniformly random key. There are also things like Diffie-Hellman shares where the size depends on the algorithm you're using.

Another API question I've been thinking about: what do you think of allowing custom flags?

That sounds very scary to me, and I hope no one ever tries to do it >.< One problem would be that, if people did this, different people would do it differently, and you'd have no idea which algorithm could have a problematic interaction with which other algorithm. Another problem would be that implementations would need to very carefully mask this sort of parameter to make sure no user input could ever affect the bits that BLAKE3 relies on for security, and someone would inevitably get this wrong.

The reason I ask is that I'd like to set "leaf" and "interior node" flag bits when hashing up a Merkle tree, in order to prevent collisions without having to prepend a byte to the input.

If you're not using secret keys with your custom tree, you could consider using keyed_hash, with the key bits set differently for the two cases. keyed_hash is zero-overhead, so this wouldn't be any more expensive than the approach above (that I hope no one ever tries!) about setting internal flag bits.

If you are using secret keys with your tree, that would be a good use case for derive_key. You could derive one subkey for your leaves, and another subkey for your parent nodes. (It sounds like you're familiar with this stuff, but just in case don't forget that your tree also needs to domain separate the root node from other nodes, to prevent length extensions.)

On that note, should BLAKE3 implementations provide proof-related functionality?

Have you looked at https://github.com/oconnor663/bao? It sounds like it might have a lot in common with some of the tree code you've written. Specifically the "slices" feature. Might be a surprising amount of overlap here.

but the alternative is that application developers would have to reimplement most of the BLAKE3 guts in order to hash things up properly, which doesn't seem right.

We currently expose a minimal undocumented interface to some of the guts: https://github.com/BLAKE3-team/BLAKE3/blob/master/src/guts.rs At the moment, this is pretty much just for Bao. But at some point it would be better to stabilize something more fleshed-out, with suitably dire warning about how most applications should stick to the usual interfaces.

That file is pretty small, because Bao doesn't currently worry too much about performance. (If you're streaming a file over the network, you're almost certainly IO-bound anyway.) One of the challenges of stabilizing it will be deciding what to do with e.g. SIMD-optimized parallel chunk compression.

Anyway, we're not going to break that interface without a version bump, so you could consider using it, if you want to do custom iterative stuff like Bao. But of course, using the guts means you're responsible for upholding invariants like root finalization, so you'll want to be very careful to test your output against BLAKE3 test vectors and the official implementation if you go that route.

Actually, there's a small difference: hashes in my "stack" implementation have a fixed position, whereas the BLAKE3 stack pushes and pops them as the tree changes. Visually, after 10 leaves have been inserted:

| -- | h1 | -- | h3 |

vs

| h1 | h3 |

I don't know if either approach is strictly superior. The "sparse stack" has a more direct correspondence to the bit pattern of the number of leaves inserted, but it also requires you to skip over empty slots when iterating over the stack (e.g. when computing the final root).

I've updated the DeriveKey API as per my suggestion. I think this strikes a reasonable balance between clarity/usability and flexibility. If someone really needs access to the raw Hasher, they can open an issue and we'll figure out a solution then. So I'll close this issue for now.

Keying the hash for interior nodes makes sense, thanks. I suppose that what I really want is to set the chunkSize parameter to an arbitrary value, so that I can easily compute a Merkle root with any leaf size. I saw that Zooko called BLAKE3 a "general-purpose Merkle tree", but this isn't really the case if the chunk size is fixed, right? Of course, I could use a forked BLAKE3 with a different chunk size, but then my hashes will be incompatible with other BLAKE3 implementations. So as it stands, I'll have to build my trees the usual way by calling Sum256 individually on each leaf and each sibling pair. But perhaps this could be sped up by computing multiple hashes in parallel, much like how your SIMD code hashes multiple blocks in parallel?

lukechampine / blake3

NewFromDerivedKey vs NewDeriveKey #1