mimblewimble / grin

Minimal implementation of the Mimblewimble protocol.
https://grin.mw/
Apache License 2.0
5.04k stars 990 forks source link

VerifiedCache should be re-enabled #3620

Closed snape479 closed 3 years ago

snape479 commented 3 years ago

Due to the recent CVE, the VerifierCache was disabled as an emergency measure. It should be enabled with the appropriate fix. PR will be forthcoming with this fix.

antiochp commented 3 years ago

Duplicate of https://github.com/mimblewimble/grin/issues/3606

phyro commented 3 years ago

I suggest we close this one

snape479 commented 3 years ago

Added a perf test to https://github.com/mimblewimble/grin/pull/3621. It shows a significant performance impact from disabling the verifier_cache. Here is the output:

No cache

% ./target/release/perf [2021-04-02 08:02:54]: Starting perf test [2021-04-02 08:02:55]: building tx1 [2021-04-02 08:03:42]: building tx2 [2021-04-02 08:04:28]: building tx3 [2021-04-02 08:05:14]: building tx4 [2021-04-02 08:06:01]: building tx5 [2021-04-02 08:06:47]: start validate tx [2021-04-02 08:06:51]: tx val time elapsed = 3599ms [2021-04-02 08:06:51]: start process block [2021-04-02 08:06:52]: block val time elapsed = 909ms

With cache

% ./target/release/perf [2021-04-02 07:58:08]: Starting perf test [2021-04-02 07:58:09]: building tx1 [2021-04-02 07:58:55]: building tx2 [2021-04-02 07:59:42]: building tx3 [2021-04-02 08:00:28]: building tx4 [2021-04-02 08:01:15]: building tx5 [2021-04-02 08:02:01]: start validate tx [2021-04-02 08:02:02]: tx val time elapsed = 878ms [2021-04-02 08:02:02]: start process block [2021-04-02 08:02:02]: block val time elapsed = 231ms

I think you should reconsider closing out this issue and merge pull request #3621.

antiochp commented 3 years ago

@snape479 Thanks for digging into this. Let me take a look at the test results today. I'm surprised we are seeing this large a difference.


Ok I think I understand this now.

The perf test is creating 5 txs but reusing the same 1000 rangeproofs for each tx. It would make sense that the cache would provide a huge benefit like this as everything is effectively a cache hit. The cost of validating those 5 txs is effectively multiple validations of those same repeated 1000 rangeproofs.

I'm not sure this is an entirely representative test. It does highlight the non-trivial cost of verifying rangeproofs, but this cost still exists with or without the cache in real world scenarios.

The block is then built with one of these txs, which also results in cache hits for those same 1000 rangeproofs.

That second point is the interesting one I think. In an ideal world we would not need to re-verify rangeproofs in a block if we have already validated the underlying transactions. But this is the place we need to be particularly careful as block validation is consensus critical (and txs validation is not). We need to consider the implications here carefully if we were to reintroduce a caching layer.


I agree that we need to revisit this and implement some kind of caching in the future, but I do think we are fine for now.

snape479 commented 3 years ago

Yes, the test uses a pathological case for the tx validation, as only one of the outputs is changed. Keep in mind an attacker may use the fact that they can make transactions that look almost the same that are actually different and use the pathological case because nodes would propagate these 5 transactions separately through the transaction pools whereas if they were exactly the same, they would only propagate as a single transaction. In terms of the block though, I think all the transactions will generally be cached and thus I think that test shows the actual picture.

phyro commented 3 years ago

Is the general consensus here that we add some form of caching later on once the benefits become relevant for our on-chain/mempool volume? if yes, I'd suggest we also consider updating/closing the linked PR https://github.com/mimblewimble/grin/pull/3621