mimblewimble / grin

Minimal implementation of the Mimblewimble protocol.
https://grin.mw/
Apache License 2.0
5.04k stars 991 forks source link

Investigate batch rangeproof verification #836

Closed antiochp closed 6 years ago

antiochp commented 6 years ago

See #832 for some context.

Rangeproof verification is expensive. During fast sync it takes approx 70 seconds to verify approx 4,000 rangeproofs.

From https://blockstream.com/2018/02/21/bulletproofs-faster-rangeproofs-and-much-more.html

... reduced the speed of 147 multiplications to only 15.5 microseconds each, getting the total verification time of a Bulletproof down to 2.3 ms, vs 5.8 ms for the old proofs.

From the above times, we should be able to verify 4,000 rangeproofs in approx 10 seconds. We should investigate why we are seeing 70 seconds here.

In #832 we modified txhashset validation to allow rangeproof verification to be skipped. So during check_compact() we can skip this step as we do not need to (re)verify rangeproofs.

We currently verify rangeproofs in 2 places -

In both cases we have multiple rangeproofs to verify. Batching would be useful in both cases.

Another thing we may want to investigate is whether we need to verify all rangeproofs every time we validate a transaction or a block? For example - when building a new block we validate all the transactions to be included in that block and presumably all these txs have already had their rangeproofs verified?

Could we safely assume a tx previously accepted by the tx pool contains only outputs with valid rangeproofs?

https://github.com/mimblewimble/grin/blob/ed78e1a721d2cf9bec2462c19824b4444a188eec/core/src/core/block.rs#L519-L526

I guess this only happens within a mining node when building a new block so maybe the benefits are minimal here anyway?

ignopeverell commented 6 years ago

Regarding that last question, we can definitely skip range proofs a node has already validated. Bitcoin Core does something similar with signatures, there's a cache of sigs that have already been checked.

I'm still a little surprised by the slowness of it, the rangeproof paper claims about 4ms verify time for a 64 bits proof. Note that this drops down to 0.5ms for each additional proof in a batch.

yeastplume commented 6 years ago

I'd need to look into this more, but this could be due to the fact that we're initialising and tearing down a scratch space on each verify (done that way to avoid having to pass a scratch space in and out of rust).

yeastplume commented 6 years ago

Will do some experimentation on this, it should be possible to create a static scratch space C-side then provide functions to init and tear it down before and after verifying a large batch. I'll test whether this makes vast improvements to our verification times.

yeastplume commented 6 years ago

Nope, sorry. It is the implementation:

-   99.56%     0.00%  pedersen::tests  secp256k1zkp-650646405640bde9  [.] secp256k1_bulletproof_rangeproof_verify_single_w_scratch                                                                             ▒
     secp256k1_bulletproof_rangeproof_verify_single_w_scratch                                                                                                                                                  ▒
   - secp256k1_bulletproof_rangeproof_verify                                                                                                                                                                   ▒
      - 99.21% secp256k1_bulletproof_rangeproof_verify_impl                                                                                                                                                    ▒
         - 94.16% secp256k1_bulletproof_inner_product_verify_impl                                                                                                                                              ▒
            - 90.93% secp256k1_ecmult_multi_var                                                                                                                                                                ▒
               - 90.92% secp256k1_ecmult_pippenger_batch                                                                                                                                                       ▒
                  - 78.67% secp256k1_ecmult_pippenger_wnaf                                                                                                                                                     ▒
                     + 57.56% secp256k1_gej_add_ge_var                                                                                                                                                         ▒
                     + 17.44% secp256k1_gej_add_var                                                                                                                                                            ▒
                     + 0.92% secp256k1_ge_neg                                                                                                                                                                  ▒
                       0.71% secp256k1_gej_double_var                                                                                                                                                          ▒
                       0.70% secp256k1_wnaf_fixed                                                                                                                                                              ▒
                  + 6.69% secp256k1_bulletproof_innerproduct_vfy_ecmult_callback                                                                                                                               ▒
                  + 5.22% secp256k1_ecmult_endo_split                                                                                                                                                          ▒
            + 2.93% secp256k1_scalar_inverse_var                                                                                                                                                               ▒
         + 3.03% secp256k1_scalar_inverse_var                                                                                                                                                                  ▒
         + 1.34% secp256k1_bulletproof_update_commit                                                                                                                                                           ▒
         + 0.65% secp256k1_bulletproof_deserialize_point

I have messed around with the actual ecmult implementation, pulling the version we have there from different places for aggsig and this, and doing a few modifications to ensure both can run.. side effect of pulling very early code from everywhere.

yeastplume commented 6 years ago

heh.. think I have it.. can you test the verify speed again, except with a release build?

Just did a test here directly within rust-secp256ketc, a debug build does 4000 verifications in 50 seconds, with --release does it in just under 14

antiochp commented 6 years ago

This was after building with --release -

Mar 21 15:39:35.794 DEBG Verified 5670 Rangeproofs, pmmr size 11355, took 29s

I wonder if some of this on a running node is due to contention on getting the lock on txhashset.write()?

apoelstra commented 6 years ago

FYI the benchmarks in the blog post come from my laptop, a i7-6820HQ CPU @ 2.70 Ghz.

But yes, without batch validation your performance will be abysmal.

yeastplume commented 6 years ago

I'm going to have another look at the batch validation function in our libsecp fork... last we left it it wasn't working, so I'm going to do some tests to see if it's an upstream issue