Optimize merkle hashing performance in `hash_tree_root`

ralexstokes commented 1 year ago

A big bottleneck of consumers of this crate is the performance of computing large Merkle trees. There are sophisticated caching techniques we can use to mitigate this; see #17. Separately, we can optimize the hashing itself, as computing the parent nodes of a set of child nodes in one layer of the Merkle tree can be done in parallel. This issue aims to address this latter performance technique.

relevant repo for investigating hashing techniques we may be able to use:

https://github.com/OffchainLabs/sszpp

ralexstokes commented 1 year ago

from convo's with @potuz

the entry point for your spare tree hash function https://github.com/ralexstokes/ssz-rs/blob/main/ssz-rs/src/merkleization/mod.rs#L123 And this is the key point that you use this signature for hashing: https://github.com/ralexstokes/ssz-rs/blob/main/ssz-rs/src/merkleization/mod.rs#L210

So your hasher function takes as input 2 chunks which are then combined into a 64 byte block and then hashed. If you could adapt this routine to use a hasher that takes a slice of bytes, multiple of 64 as input, and another preallocated buffer, multiple of 32 as output, so that the hasher hashes continously all 64 bytes chunks and spits out the consecutive 32 byte hashes in the output slice, then you can trivially switch to the vectorized assembly without any cost. The Go version of the algorithm is here: https://hackmd.io/80mJ75A5QeeRcrNmqcuU-g#Solving-1 And the faster version I'm using in ssz++ is here https://github.com/OffchainLabs/sszpp/blob/main/lib/merkleize.hpp#L149

ralexstokes commented 7 months ago

latest:

optimized hashing routines here: https://github.com/prysmaticlabs/hashtree

demo: https://github.com/potuz/hashtree_rust_demo

ralexstokes commented 4 months ago

closing in lieu of #156

ralexstokes / ssz-rs

Optimize merkle hashing performance in `hash_tree_root` #93