jsign / verkle-vs-patricia

Verkle Tree vs Merkle Patricia Tries benchmarks
MIT License
9 stars 2 forks source link

go-ipa: use more performant Inverse #3

Closed jsign closed 1 year ago

jsign commented 1 year ago

This PR shows how the optimized go-ipa Inverse(...) code change impacts benchmarks in this repo.

The raw Inverse(...) benchmark has the same expected speedup improvement as shown in the go-ipa benchmark:

name                  old time/op    new time/op    delta
FpInverse/go-ipa-16     1.66µs ± 5%    1.12µs ± 2%  -32.34%  (p=0.000 n=15+13)
FpInverse/go-blst-16    1.42µs ± 1%    1.42µs ± 2%     ~     (p=0.955 n=13+15)

Regarding our other benchmark to specifically measure the performance of trie key generation:

name                  old time/op    new time/op    delta
HashAddress/pedersen_hash-16    6.71µs ± 1%    6.08µs ± 2%  -9.40%  (p=0.000 n=12+15)
HashStorageSlot/pedersen_hash-16    8.45µs ± 3%    7.88µs ± 3%  -6.75%  (p=0.000 n=15+15)

So, we got around a ~1.09x speedup.

And the quite exciting benchmark improvements for updating a Verkle Trie leaf:

PedersenCommit/non_zero_entries=1-16               39.4µs ± 1%    38.0µs ± 2%   -3.59%  (p=0.000 n=14+15)
PedersenCommit/non_zero_entries=2-16               46.5µs ± 1%    45.2µs ± 1%   -2.73%  (p=0.000 n=15+15)
PedersenCommit/non_zero_entries=4-16               60.9µs ± 2%    59.3µs ± 2%   -2.56%  (p=0.000 n=15+13)
PedersenCommit/non_zero_entries=8-16               89.7µs ± 2%    87.7µs ± 1%   -2.17%  (p=0.000 n=14+12)
PedersenCommit/non_zero_entries=16-16               147µs ± 3%     146µs ± 2%     ~     (p=0.217 n=15+13)
PedersenCommit/non_zero_entries=32-16               265µs ± 3%     267µs ± 2%     ~     (p=0.425 n=15+14)
PedersenCommit/non_zero_entries=64-16               503µs ± 2%     508µs ± 1%     ~     (p=0.134 n=15+14)
PedersenCommit/non_zero_entries=128-16              988µs ± 1%     982µs ± 3%     ~     (p=0.474 n=11+15)
PedersenCommit/non_zero_entries=256-16             1.94ms ± 3%    1.96ms ± 1%   +1.00%  (p=0.009 n=15+14)
PedersenUpdateCommitment/vec_num_entries=1-16      13.9µs ± 2%     8.7µs ± 3%  -37.53%  (p=0.000 n=15+15)
PedersenUpdateCommitment/vec_num_entries=4-16      13.9µs ± 3%     8.7µs ± 3%  -37.60%  (p=0.000 n=15+15)
PedersenUpdateCommitment/vec_num_entries=16-16     14.1µs ± 1%     8.7µs ± 3%  -38.32%  (p=0.000 n=13+15)
PedersenUpdateCommitment/vec_num_entries=64-16     13.9µs ± 0%     8.6µs ± 2%  -37.69%  (p=0.000 n=11+12)
PedersenUpdateCommitment/vec_num_entries=256-16    14.0µs ± 0%     8.8µs ± 2%  -37.57%  (p=0.000 n=12+12)

This means that updating a Verkle Trie leaf got ~1.6x speedup. This makes sense as shown in my previous document where the flamegraph shows that the cost was dominated by Inverse(..). We also got a slight improvement in full Pedersen Commitment, but that operation is dominated by Mul(...).

Note: I'll keep this as a draft PR until we have an idea if https://github.com/crate-crypto/go-ipa/pull/26 will get merged.

gballet commented 1 year ago

results on my arm64 machine:

name                  old time/op    new time/op    delta
FpInverse/go-ipa-12     2.82µs ± 0%    1.93µs ± 0%   ~     (p=1.000 n=1+1)
FpInverse/go-blst-12    2.47µs ± 0%    2.53µs ± 0%   ~     (p=1.000 n=1+1)

So the speedup is similar.