Open oskarth opened 4 years ago
Semphore currently uses blake2 hash function (quantum forward secracy) and pedersen hash (harden mimc collision resistant) we also use eddsa signature for similar crypto UX, this could be replaced with knoledge of preimage as signature.
An example minimal circuit that is lacking the quantum forward secrecy is https://github.com/peppersec/tornado-mixer/ its possibly a better benchmark candidate.
You would need to add shamir secret share of the leaf but that is for both.
Regarding Point 4: Precompute proofs
Using Groth16 (and possibly any other ZKP system that doesn't rely on random oracles) I believe this is possible provided that small changes to the input does not drastically change the witness values.
For example, consider two very small Merkle trees Hash(Hash(a), Hash(b)) and Hash(Hash(a), Hash(c)). Then the witness components required to verify that Hash(a) is computed correctly are the same for both, so the provers work in verifying this value does not necessarily need to be repeated.
In Groth16, I believe some of the work for computing the first two proof elements A and B can be precomputed for these types of circuits. Regarding the third proof element C, anything related to the output of the FFT will need to be computed every time from scratch (so a guestimated 1/8 of the total work cannot be aided with precomputation, not including FFT costs).
Some of these are avenues that definitely can be explored.
About reducing the number of constraints: we use some methods that are possibly safer, though come at a cost - we use Pedersen hashes for commitments and Blake2s for nullifier computation. If you believe the algebraic hash functions, such as MiMC, are strong enough to provide sufficient hiding, post-quantum security and not reveal anything about the hashes inputs - you could use those instead and lower the amount of constraints considerably.
About WASM vs JS: I'm not sure you'll get to subsecond with the current tooling, either JS or WASM. That said, a native Zcash spend proof in Bellman takes on a good laptop, as far as I remember, about a second. I've also done some benchmarks in the past with single-threaded Bellman: https://github.com/kobigurk/wasm_proof. With websnark, the situation is better since it's multi-threaded and doesn't have the overhead of Rust -> WASM, though I don't have numbers.
As to prover as a service, this isn't trivial. In the StarkWare case it makes sense - there's no privacy involved. The StarkWare prover, roughly, aggregates a bunch of in-the-clear transactions and proves it was done correctly. Anyone could do it, since there's no secret data involved. In this case, the user has private data they have to input to the proof. There could be interesting methods to explore here, though it's not as direct as just offloading.
Regarding proof precomputation, I agree with @mmaller that some of A and B, and even parts of C (e.g., some of the sA + rB part of it) can be precomputed. One example is moderately expensive and doesn't seem to be able to be precomputed - the H coefficients, which is the where FFT happens.
I think your best bet would be to use multicore Bellman on native mobile. I believe this would be by far the most efficient method and would also alleviate the need for having to download circuit.json
, as its equivalent be efficiently compiled into the program binary. The proving key might also be more efficiently represented, though I'd have to look at it again to be sure. proving_key.bin
is already binary, though maybe points are decompressed or there are zero points saved, etc.
It would require the rewrite of the Semaphore circuit in Bellman. With the gadgets already existing in Bellman and sapling-crypto, it shouldn't be a huge undertaking, though not trivial.
It would require the rewrite of the Semaphore circuit in Bellman. With the gadgets already existing in Bellman and sapling-crypto, it shouldn't be a huge undertaking, though not trivial.
Could we use the circom importer that you wrote for the trusted setup to bypass the need to rewrite?
Its currently taking us 0.672 sec to make a proof on mobile. Estimate 2.6 seconds on mobile.
Regarding WASM, looking at the available backends in https://github.com/kobigurk/wasm_proof, I see that it's using Rust pairing https://github.com/kobigurk/wasm_proof/tree/db43012b/pairing
Unfortunately I expect this backend to be significantly slower than state-of-art. In particular for WASM, Herumi's MCL can be used https://github.com/herumi/mcl#how-to-build-for-wasmwebassembly or Barrentenberg (warning GPL) https://github.com/AztecProtocol/barretenberg
MCL is also undergoing a security audit at the moment sponsored by the Ethereum Foundation.
For instance, assuming you use BN254 Snarks here are a couple of implementation issues that will slow down the code by over 2x:
Fp::mul at https://github.com/kobigurk/wasm_proof/blob/db43012b/ff/ff_derive/src/lib.rs#L742-L786
Scalar multiplication
Pairing
Addition chains:
Towering:
I can't comment on the implementation of the ZK-Snarks on top of the pairing primitives as I didn't implement them (yet) but AFAIK parallel multi-scalar point multiplication seemed quite important (in particular Pippenger: https://ethresear.ch/t/simple-guide-to-fast-linear-combinations-aka-multiexponentiations/7238)
Do we have to use wasm for this ? Can we not use the native bellman prover because it does not need to run in browser ?
That is what we use in our estimates above. So i was hoping it would just be these times directly.
We want to be able to run in the browser as well, at least eventually. For mobile and desktop, I believe bellman native can be used as is (with wrapper in Nim e.g.). Unless there are potential issues with using different backends for mobile with WASM etc when it comes to compatibility? Haven't looked into this or thought about it in detail.
2.6s on mobile seems a tad high, especially if this means higher variance for low end devices. As you said before @barrywhitehat, more benchmarks for mobile would be useful. Though this can probably be complemented with the precomputed proofs and batching idea?
We want to be able to run in the browser as well, at least eventually. For mobile and desktop, I believe bellman native can be used as is (with wrapper in Nim e.g.). Unless there are potential issues with using different backends for mobile with WASM etc when it comes to compatibility? Haven't looked into this or thought about it in detail.
That should not be a problem.
2.6s on mobile seems a tad high, especially if this means higher variance for low end devices. As you said before @barryWhiteHat, more benchmarks for mobile would be useful. Though this can probably be complemented with the precomputed proofs and batching idea?
I think this idea may not work because the work needed to combine proofs is much bigger than the work we do in teh whole circuit. So if we want to batch proofs together it ends up being more expensive.
We have some preliminary benchmarks on mobile
Out constraint size with 32, 24, 16 merkle depth. 24 ~= 1 Million users. 16 ~= 65k users
iphone se 2016 32: 1.615 24: 1.381 16: 0.925
iphone 11 32: 0.438 24: 0.365 16: 0.247
We are confident we can get this bellow 1 second for these devices. What is the minimum power device you want to support ? Also how many users do you want to support right now ?
@jacqueswww perhaps you have a good idea on this.
More benchmarks thanks @kilic
Results are generated with browserstack.com
Device | Security Level | Merkle Depth | Prover Time (seconds) |
---|---|---|---|
Pixel 4 | 128 | 16 | 0.541 |
Pixel 4 | 128 | 24 | 0.795 |
Pixel 4 | 128 | 32 | 0.917 |
Pixel 4 | 80 | 16 | 0.464 |
Pixel 4 | 80 | 24 | 0.542 |
Pixel 4 | 80 | 32 | 0.749 |
Pixel 2 | 128 | 16 | 0.62 |
Pixel 2 | 128 | 24 | 0.986 |
Pixel 2 | 128 | 32 | 1.061 |
Pixel 2 | 80 | 16 | 0.492 |
Pixel 2 | 80 | 24 | 0.624 |
Pixel 2 | 80 | 32 | 0.842 |
Galaxy S8 | 128 | 16 | 0.538 |
Galaxy S8 | 128 | 24 | 0.775 |
Galaxy S8 | 128 | 32 | 0.808 |
Galaxy S8 | 80 | 16 | 0.412 |
Galaxy S8 | 80 | 24 | 0.551 |
Galaxy S8 | 80 | 32 | 0.677 |
Iphone 11 | 128 | 16 | 0.247 |
Iphone 11 | 128 | 24 | 0.364 |
Iphone 11 | 128 | 32 | 0.447 |
Iphone 11 | 80 | 16 | 0.200 |
Iphone 11 | 80 | 24 | 0.249 |
Iphone 11 | 80 | 32 | 0.354 |
Iphone 8 | 128 | 16 | 0.304 |
Iphone 8 | 128 | 24 | 0.458 |
Iphone 8 | 128 | 32 | 0.553 |
Iphone 8 | 80 | 16 | 0.252 |
Iphone 8 | 80 | 24 | 0.336 |
Iphone 8 | 80 | 32 | 0.465 |
Iphone SE | 128 | 16 | 0.921 |
Iphone SE | 128 | 24 | 1.391 |
Iphone SE | 128 | 32 | 1.584 |
Iphone SE | 80 | 16 | 0.758 |
Iphone SE | 80 | 24 | 0.948 |
Iphone SE | 80 | 32 | 1.322 |
I reckon Android 7/8 with 1GB of RAM. And Iphone 6 I think is the equivalent (Iphones is only for some countries, but Iphones stay in the hands of users much longer). Is what you should as your minimal requirement with maximum reach worldwide.
From above benchmarks; the speed will be manageable on these devices by the looks of it? Which is great! As long as it works on lower (1GB RAM) devices.
For interesting view check this composition for an app listed on google play store here for the Southern African Region (the app is targeted at the low to mid income market segment).
Awesome benchmarks @barryWhiteHat and @kilic!
And thanks for jumping in @jacqueswww :)
There are some specific architectures and versions the Core app wants to support, but I can't find the resource for this right now. I recall that a standard model used for testing is/was Galaxy S6 and iPhone 6s, but this might have changed. In general, Android is likely to be more of a problem than iOS.
@cammellos speaking for Core:
Question on the merkle tree size: Can this easily be upgraded? Start with 16 and then upgrade to 32 with backwards compatibility (e.g.)
Question on the merkle tree size: Can this easily be upgraded? Start with 16 and then upgrade to 32 with backwards compatibility (e.g.)
Yes this can be done easily. Just need to upgrade the user app and replace the snark proving key. Plan is to optimize the prover more as the tree grows.
Thanks for moving this forward and the benchmark, really insightful stuff.
What is the current minimum set of device(s) the app aims to support?
The lower end device we support is Galaxy Note 4, though the app is barely usable on it as I understand, but you can transact and have a limited number of chats opened.
For a an anti-spam solution such as this, is subsecond/1s to generate a message an acceptable target from a UX POV? This is before potential precomputed proofs, etc.
1 second is fairly high in terms of responsiveness, as we need to add network delay on top, but not a deal breaker for some solutions, it all depends on the UX imposition on the users and how the solution is used in the app.
1 second is fairly high in terms of responsiveness, as we need to add network delay on top, but not a deal breaker for some solutions, it all depends on the UX imposition on the users and how the solution is used in the app.
This happens per message that is sent. So it ends up being a little bit of lag in the chat. But it was my understnading that that overall network lag was enough so you did not really notice this. What would be a more acceptable number ?
Also this varies with device, with newer devices we are down at 2/10ths of a second. Also we plan to optimize this more so that more with a ARM assembly implementation of the prove. Sub second is more of a place to start out.
This happens per message that is sent. So it ends up being a little bit of lag in the chat. But it was my understnading that that overall network lag was enough so you did not really notice this. What would be a more acceptable number ?
Hard to say, currently for a message to show on the recipient side is roughly under a second (more or less), but pretty much instant. Ideally building a message is 0.25/0.5, so overall we are around 1s with network latency. But if it's a CPU expensive operation battery usage is going to be the driving factor here likely on slow devices (that's though just my guess).
Regardless I would not worry too much about latency at this point, 1 second might be ok and definitely not a show stopper, it matters more about how we are going to use this and what we can build around at the product level (from core point of view that is).
The way I see it is that having a 1 second delay for messages is more acceptable UX wise than for example asking users to stake/deposit some funds to send messages, so it might be that when thinking about a comprehensive spam solution the UX degradation might have to be addressed and made palatable somewhere else to give user a buy in, at which point the delay in sending message will be absolutely fine.
We can always have a free-for-all user run, permission-less network (as it is now), and have other parts of the network that are spam free but requires deposit/staking etc, at the cost of higher latency, but that give access to something valuable to the user (curated content, anonymous voting etc).
Thanks again for the input!
Note that 1 second is with the current state of optimization. From @kilic's code at https://github.com/kilic/bls12-381 here are the main optimizations done/available:
So if we can reach 1s now this is likely a reasonable worst-case baseline.
Furthermore if we reach the limits of compute there might be other avenues to explore: can proofs be aggregated? Or even can a market be created with SNT so that lighter nodes delegate to heavier nodes for a fee?
Short term plan is what we do until we hit the limit of compute.
Medium term i would like to move to plonk which allows us to use custom constraints to create the same circuit using much less constraints which would allow us to reduce the proving time alot i hope.
Long term we can even invest in some custom crypto that does what RLN does. But seems this might be hard to do in a way that slashings can happen quickly. I know @kobigurk has some ideas on this. Also @mmaller would be able to advise on this.
@jacqueswww, @barryWhiteHat I have got results from Iphone 6 for various cases.
Device | Security Level | Merkle Depth | Prover Time (seconds) |
---|---|---|---|
Iphone 6 | 128 | 16 | 1.143 |
Iphone 6 | 128 | 24 | 1.885 |
Iphone 6 | 128 | 32 | 2.554 |
Iphone 6 | 80 | 16 | 0.924 |
Iphone 6 | 80 | 24 | 1.326 |
Iphone 6 | 80 | 32 | 1.627 |
Here is our write up about incentives for RLN https://ethresear.ch/t/rln-incentives-for-p2p-networks/8085 now
At the moment we are trying to decide between reducing proving time on mobile or moving forward with building incentive contracts.
On your side is the iphone 6 1.3 second proving time a problem? I feel as a lower bond this is really gr8 and perhaps yall or the broader community can build a faster prover that we can use here and in a bunch of other projects.
Looking into Rust implementations, @jon-chuang has done great optimizations to Zexe those past months, for example this one https://github.com/scipr-lab/zexe/pull/296.
I expect Zexe is currently the fastest Rust Snarks library currently.
Also given the flurry of activity I see on highly-optimized Snarks (see also https://github.com/ConsenSys/gnark), there might be enough people and research/implementation directions to make a common Discord channel?
Looking into Rust implementations, @jon-chuang has done great optimizations to Zexe those past months, for example this one scipr-lab/zexe#296.
I expect Zexe is currently the fastest Rust Snarks library currently.
Also given the flurry of activity I see on highly-optimized Snarks (see also https://github.com/ConsenSys/gnark), there might be enough people and research/implementation directions to make a common Discord channel?
I can confirm this is very fast. On BW6 with batch inversion, GLV and optimized assembly you get 3-4x.
Do note that most of the optimisations only apply to scalar muls and not MSMs. Currently, our MSMs for bn254 are roughly on par with barretenberg, lagging by around 5%
Problem
Prove time for Semaphore (https://github.com/kobigurk/semaphore) zKSNARKs using circom, groth and snarkjs is currently way too long. It takes on the order of ~10m to generate a proof. With Websnark, it is likely to take 30s, which might still be too long. We should experiment with native code on mobile here.
Acceptance criteria
While we can potentially use precomputed proofs to some extent, this needs to be on the order of a seconds to be feasible for e.g. spam protection signalling normal messages. This means a performance improvement of a bit more than two orders of magnitude is necessary. Alternatively, perf can be a bit less if we get precomputed proofs and batching to work smoothly.
Details
See https://github.com/vacp2p/research/blob/master/zksnarks/semaphore/src/hello.js#L118 for proof generation. This is using snarkjs, circom and running through Node, which is likely one of the slower environments. Additionally, number of constraints for the snark are around ~60k.
Reducing the problem to a simpler circuit (Pedersen commitment, 5 rounds), see https://ethresear.ch/t/benchmark-circom-vs-bellman-wasm-in-chrome-on-mobile/5261, we note the previous benchmarks of Circom and Bellman in different environments such as Node/WASM/Native:
I.e. wasm 50x faster, and native 500x faster. There appears to be WIP and room for improvement for the wasm benchmarks.
Circom and Bellman both have around ~3.5k constraints, i.e. 5% of our original snark.
For our initial problem, this would mean roughly:
3.5k -> 60k; 20x constraints
=> node ~15m => wasm ~30s => native ~2s
Possible solutions
1. Use a better environment (WASM or native).
No-brainer. Either use Websnarks with existing Circom snarks or Bellman to WASM.
WASM might not be enough though, especially to get to subsecond, and requires more optimization.
Native would be useful, especially for resource restricted devices and for spam protection. Additional benchmarks on mobile would be good to have. A big issue is in terms of deployment - currently native would mean something like Bellman, which right now doesn't have a library like Circom for writing SNARKs. This seems to be an ongoing effort:
2. Decrease number of constraints.
60k could possible be reduce quite a lot, but require additional research (e.g. merkle tree structure, number of rounds for various hash functions, etc). If this could be halved, then Webnarks might be enough.
3. Use a different type of ZKP
Other types of ZKP come with lower proof time, but with other trade offs (such as proof size). The main reasons to explore this would, from my current POV, be related to: proof time; security assumptions (cryptographic assumptions and trusted setup); and proof time.
4. Precompute proofs
It's possible we can precompute proofs for e.g. spam protection. This means you can instantly signal, and then 'save up' messages. How this would work in practice requires further research.
5. Prover as a service?
I don't know what this would look like, but perhaps proof generation could be outsourced. This appears to be STARKware business model if I understand it correctly: https://starkware.co/the-road-ahead-in-2019/
6. Batching of signals
A bit similar to 'precompute case', but it might be doable to take many messages in a 30s period and batch them together. In the human chat case, this would impact UX. Worth exploring though.
Note
The benchmark in problem descrption and ethresear.ch might be incorrect, so worth checking them again.