sigp / milhouse

Persistent binary merkle tree
Apache License 2.0
18 stars 7 forks source link

Implement efficient tree builder #1

Closed michaelsproul closed 2 years ago

michaelsproul commented 2 years ago

For SSZ and Serde decoding it would be good to have an efficient method for building a Tree<T> from an iterator of T.

In 0ef2951003175f1cc468b2daff70f9b7b2bcf4a1 I deleted a Tree::create method that built up a tree from a vec by constructing each layer of the tree. I didn't bother upgrading it to work with packed leaves because it was going to be difficult and still sub-optimal: we don't want to have to allocate the whole list as a vec before creating the tree. The interim strategy of creating an empty tree and then repeatedly pushing to it is also sub-optimal because it allocates a lot of unnecessary intermediate nodes.

I imagine we can write a better implementation that doesn't allocate any vecs or unnecessary nodes, by keeping a stack of tree nodes under construction. When item 0 is pushed we would store it in a Tree::Leaf on the stack. When item 1 is pushed, we would pop the leaf for the 0th item and push a new Tree::Node with item 0's leaf on the left and item 1's leaf on the right. For item 2 we push a leaf, for item 3 we pop the item 2 leaf and the (0, 1) node and push a new Tree::Node with left: (0, 1) and right: (2, 3). We continue this until we run out of elements, at which point we fold up the whole stack into a single node, adding Tree::Zero on the right as necessary.

The builder would need to be aware of the depth of the tree it was constructing, as well as how to handle packed leaves (they could be built up incrementally by pushing to the values SmallVec). The API could either be a separate builder struct, or a method that takes an iterator of T.

A builder API might look like this:

let mut tree_builder = TreeBuilder::new();
for item in items {
    tree_builder.push(item);
}
let tree = tree_builder.build();

An iterator API handling errors might look like this:

itertools::process_results(item_results, |iter| Tree::from_iter(depth, iter))
pawanjay176 commented 2 years ago

I want to give this a shot. Will help me understand the existing code better.

michaelsproul commented 2 years ago

Perfect! I had you in mind when I wrote it!

michaelsproul commented 2 years ago

Sorry @pawanjay176, this became a real bottleneck while testing so I ended up implementing it in 660d3c86f42bdf2a08e0c4fd3c0340a472ee1995

pawanjay176 commented 2 years ago

Hey no worries, I got distracted with other stuff, sorry :sweat_smile: