taikoxyz / raiko

Multi-proofs for Taiko. SNARKS, STARKS and Trusted Execution Enclave. Our previous ZK-EVM circuits are deprecated.
Apache License 2.0
124 stars 91 forks source link

Feat: Distributed proof generation #105

Open mratsim opened 7 months ago

mratsim commented 7 months ago

At the moment, zkVMs are somewhat too slow for real-time proving on a single computer. This is especially the case for Risc0 which requires dozens of GPUs.

Unfortunately, on most cloud providers scaling is significantly easier by adding more machines, or it might be the only way to scale as for example on AWS, we cannot get more than 8 GPUs per instance.

Hence we need distributed computing support on Raiko.

zkVMs are all in the process of adding continuations (Risc0, Powdr) / Sharding (SP1) / Chunking (Halo2), hence an easy way to distribute compute would be to have a master prover with a fleet of workers. The master interacts with the end-user and delegates work to workers.

GPU Cloud pricing analysis

We need 16GB or 24GB VRAM. P40 have a 2016 architecture and V100 a 2019 architecture image

EC2 P3 have V100 but 16GB and more expensive so non-starter image

P4d with A100 (2021) and 80GB Vram are non-starter due to price image

EC2 G5 with A10G fits our need but limited to 8 image

Risc0

Distributing Risc0 requires sending work remotely and calling prove_segment on the worker then sending back the result.

Note: Risc0 only supports 1 GPU per machine (GPU 0):

Once the stark proof is generated, it needs to be wrapped in Groth16 (automatically done when going through Bonsai). We should be able to use their compact_proof / stark2snark for this:

SP1

Distributing SP1 requires sending work remotely and calling prove_shard on the worker then sending back the result.

Note: SP1 supports delegating to a "Succinct Network" with protocol defined here: https://github.com/succinctlabs/sp1/blob/5db203c/sdk/src/proto/network.rs and RPC https://github.com/succinctlabs/sp1/blob/5db203c55647c30618431822d33f419614f9fab6/sdk/src/lib.rs#L104-L157 but this seems to only delegate work, not split work.

Wrapping stark proof in snark is WIP.

Maintenance considerations

We will likely need:

Maintenance to sync with upstream should be minimized.

cc @Champii @petarvujovic98 @CeciliaZ030