turbofish-org / merk

High-performance Merkle key/value store
Apache License 2.0
226 stars 36 forks source link

Proof benchmark misleading #24

Closed ethanfrey closed 5 years ago

ethanfrey commented 5 years ago

I was surprised to read in https://github.com/nomic-io/merk/tree/develop#benchmarks that getting a proof is faster than a read!

I then looked at the code, in particular read and prove and realized that they create keys differently. Notably, read always queries an existent value, while prove queries a random value (which is not in the store).

I think you should update the benchmark to prove one value guaranteed to be in the store.

I also note the setup uses different batch sizes for writing, although I don't think this will affect the performance. It just seems like something else nice to normalize.

mappum commented 5 years ago

Proving a value not in the store equates to proving its 2 bounding in-order neighbors, as an absence proof. It's not a free noop query.

The reason reads are slower is because they fetch from the backing store and not from the nodes that are being held in memory: #20

I realize now, however, the uncached proof benchmark is misleading because each query loads nodes and holds them in memory, so over time it becomes almost the same as the cached benchmark. Will have to prune the tree after each query for a worst-case benchmark.

mappum commented 5 years ago

Oh and actually, the keys in the prove benchmark are guaranteed to be in the store since both the setup and the "random" keys are deterministically seeded with the batch index (1 through 1000).

mappum commented 5 years ago

I changed get to now fetch from memory when possible instead of always fetching from disk, so that increased the read speeds to be more intuitive for the cached benchmarks (f1548f5).

For the uncached benchmarks, I measured with a prune after each query so that the cached nodes don't accumulate and end up the same as the cached benchmark.

I updated the benchmarks to reflect the new measurements (0be858c).

ethanfrey commented 5 years ago

Fair enough, I just saw random, and didn't consider seeding. Nice trick.

It just showed warning signs to me if a prove is faster than a read (as a prove should do more work). If you optimize it super well, maybe only a little slower, but not faster, unless you are comparing apples to oranges.

Thanks for updating them.

ethanfrey commented 5 years ago

Read speed is through the roof now! Less than 1 micro second. Nice!