sigp / milhouse

Persistent binary merkle tree
Apache License 2.0
18 stars 7 forks source link

Fast rebase #23

Closed michaelsproul closed 1 year ago

michaelsproul commented 1 year ago

This PR implements a specialised rebase_on rather than using diffs to rebase. It is between 5-45x faster than the diff approach according to benchmarks.

It tries harder not to recurse unnecessarily, by checking subtree equality using the hash and the length. The length is required because packed leaves of different lengths can have the same hash due to 0 padding (e.g. we can't differentiate between 3x 0u64 and 4x 0xu64).

It's also faster because it avoids intermediate allocations for the diff (a bunch of btree nodes).

michaelsproul commented 1 year ago

Benchmark numbers for fast rebase_on (ignore 5% random variance):

rebase_identical/800000 time:   [4.2303 ms 4.2407 ms 4.2517 ms]
                        change: [-5.2636% -4.8984% -4.5363%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

rebase_push_1_back/800000
                        time:   [4.2203 ms 4.2288 ms 4.2377 ms]
                        change: [-5.5306% -5.2313% -4.8946%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

rebase_mutate0/800000   time:   [4.1952 ms 4.2025 ms 4.2101 ms]
                        change: [-4.9558% -4.7311% -4.4831%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

rebase_completely_different/800000
                        time:   [4.1430 ms 4.1490 ms 4.1554 ms]
                        change: [-6.8671% -6.6405% -6.4236%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

Then here's rebase implemented using rebase_via_diff:

rebase_identical/800000 time:   [27.317 ms 27.376 ms 27.437 ms]
                        change: [+543.34% +545.56% +547.70%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild

rebase_push_1_back/800000
                        time:   [27.143 ms 27.210 ms 27.277 ms]
                        change: [+541.37% +543.43% +545.44%] (p = 0.00 < 0.05)
                        Performance has regressed.

rebase_mutate0/800000   time:   [27.905 ms 27.966 ms 28.029 ms]
                        change: [+563.55% +565.47% +567.50%] (p = 0.00 < 0.05)
                        Performance has regressed.

Benchmarking rebase_completely_different/800000: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 19.2s, or reduce sample count to 20.
rebase_completely_different/800000
                        time:   [190.97 ms 191.21 ms 191.46 ms]
                        change: [+4499.4% +4508.6% +4518.3%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

There's a 5x jump for the similar cases, from 4ms to 25ms. And for the completely_different case a 45x jump from 4ms to 191ms.