sigp / milhouse

Persistent binary merkle tree
Apache License 2.0
19 stars 7 forks source link

Implement `pop_front` in terms of a level iterator #38

Closed michaelsproul closed 5 months ago

michaelsproul commented 5 months ago

For MaxEB we need methods to slice an SSZ list, removing n elements from the front.

If n has multiple trailing zeros in binary format then we can reuse substantial portions of the tree when slicing, e.g. if n = 4 we can cheapy clone the subtrees for indices 4, 8, 12, 16 and so on, reordering them under new parent nodes.

This PR tries to implement this algorithm efficiently, by introducing the concept of a "level iterator" which iterates the Arced internal nodes at a given depth. It combines this with extensions to the tree Builder type which enable it to consume these internal nodes and build a tree.

There are new proptest tests added to extensively test the functionality, along with unit tests and benchmarks. Benchmarks show that pop_front is substantially faster than pop_front_slow when it is able to exploit node reuse, and no slower when it is unable to (when n has no trailing zeroes, i.e. because it is odd).

michaelsproul commented 5 months ago

Benchmark results for pop_front:

pop_front_noop/800000   time:   [16.201 ns 16.242 ns 16.287 ns]
pop_front_3/800000      time:   [41.597 ms 41.789 ms 41.990 ms]
pop_front_4/800000      time:   [16.877 ms 16.922 ms 16.980 ms]
pop_front_32/800000     time:   [1.5430 ms 1.5442 ms 1.5455 ms]
pop_front_400k/800000   time:   [194.47 µs 194.52 µs 194.59 µs]

Benchmark results for pop_front_slow:

pop_front_noop/800000   time:   [42.177 ms 42.226 ms 42.278 ms]
pop_front_3/800000      time:   [44.340 ms 44.411 ms 44.490 ms]
pop_front_4/800000      time:   [43.865 ms 43.923 ms 43.989 ms]
pop_front_32/800000     time:   [43.183 ms 43.235 ms 43.293 ms]
pop_front_400k/800000   time:   [18.228 ms 18.291 ms 18.362 ms]

As you can see the time taken by pop_front_slow is proportional to the number of elements iterated. It takes ~43ms for all small values of n and only speeds up when slicing off a substantial portion of the list. By comparison we see that the optimised pop_front is much faster when it can exploit alignment, completing pop_front(4) in 38% of the time required by pop_front_slow(4) (a 2.6x speedup). For n=32 the speedup is even better at 28x. For n=400k the speedup is 94x. When there's no alignment to exploit (n=3), performance is similar.