Correctness of the proof on the collision-free property of Merkle Tree

hacl-star / merkle-tree

A verified Merkle Tree, built as a standalone project on top of EverCrypt

7 stars 5 forks source link

I think the formalization of the collision-free property of Merkle Tree might have a defect.

The collision-free property of Merkle Hash is proved by the function extract, whose type is defined as follows. https://github.com/project-everest/hacl-star/blob/144c44e1fa6e8062b2c50d4cd7ad0e41e7f0fe29/secure_api/merkle_tree/MerkleTree.Spec.fst#L485-L487 In the proof, a collision instance of the Merkle Hash (mt_collide #_ #f n i) is reduced to a collision instance of the base hash function (hash2_raw_collide).

The collision of the base hash function is formalized as the instance of hash_2_raw_collide defined as fllows. https://github.com/project-everest/hacl-star/blob/144c44e1fa6e8062b2c50d4cd7ad0e41e7f0fe29/secure_api/merkle_tree/MerkleTree.Spec.fst#L425-L430 And the hash and hash_fun_t is defined as follows. https://github.com/project-everest/hacl-star/blob/144c44e1fa6e8062b2c50d4cd7ad0e41e7f0fe29/secure_api/merkle_tree/MerkleTree.Spec.fst#L13-L15

However, the instance of the hash_2_raw_collide can be constructed without a collision instance of the Merkle Hash. Let #f:hash_fun_t #hsz be a hash function used in the hash2_raw_collide. The domain of #f is strictly greater than the codomain of it, and both of them are finite set, therefore there should be a collision instance of the hash function #f (by the pigeonhole principle).

Although the pigeonhole principle isn't used in the function extract and currently Z3 is not powerful enough to prove the pigeonhole principle by itself, this formalization might lead to incorrect proof.

You are right that you can construct a value of type hash2_raw_collide from the pigeonhole principle. Therefore, you can give a definition (a proof) of extract without using its last argument of type mt_collide #_ #f n i.

https://github.com/project-everest/hacl-star/blob/144c44e1fa6e8062b2c50d4cd7ad0e41e7f0fe29/secure_api/merkle_tree/MerkleTree.Spec.fst#L485-L487

However, that wouldn't be a valid proof of collision-freedom of Merkle trees. Reductions to collision resistance of hash functions are special in that collision-finding algorithms always exist, but they are hard to find. A valid proof of collision-freedom of Merkle trees needs to show an explicit algorithm that given a collision in the tree, can efficiently find a collision of the hash function. So, to check that the proof of extract is valid you need to inspect its definition and convince yourself that the underlying algorithm is efficient. In this case, the algorithm just traverses the trees looking for the point were the hash collision occurs, so it incurs an overhead proportional to the height of the trees. A proof using the pigeonhole principle doesn't translate into an efficient algorithm.

This paper by Phil Rogaway explains this issue much better than I could do here: https://www.cs.ucdavis.edu/~rogaway/papers/ignorance.pdf.

I suspect your concern is more that Z3 could validate an incorrect collision-finding algorithm by exploiting the pigeonhole-principle without the user noticing. That's unlikely as you point out but possible. The algorithm in the proof of extract is simple and convincing enough, so you may consider the proof a sanity check of its correctness. One would need to define a notion of efficiency and quantify the overhead of the algorithm to give a complete proof.

hacl-star / merkle-tree

Correctness of the proof on the collision-free property of Merkle Tree #1