At the moment, zkVMs are somewhat too slow for real-time proving on a single computer.
This is especially the case for Risc0 which requires dozens of GPUs.
Unfortunately, on most cloud providers scaling is significantly easier by adding more machines, or it might be the only way to scale as for example on AWS, we cannot get more than 8 GPUs per instance.
Hence we need distributed computing support on Raiko.
zkVMs are all in the process of adding continuations (Risc0, Powdr) / Sharding (SP1) / Chunking (Halo2), hence an easy way to distribute compute would be to have a master prover with a fleet of workers. The master interacts with the end-user and delegates work to workers.
GPU Cloud pricing analysis
We need 16GB or 24GB VRAM.
P40 have a 2016 architecture and V100 a 2019 architecture
EC2 P3 have V100 but 16GB and more expensive so non-starter
P4d with A100 (2021) and 80GB Vram are non-starter due to price
EC2 G5 with A10G fits our need but limited to 8
Risc0
Distributing Risc0 requires sending work remotely and calling prove_segment on the worker then sending back the result.
Once the stark proof is generated, it needs to be wrapped in Groth16 (automatically done when going through Bonsai).
We should be able to use their compact_proof / stark2snark for this:
At the moment, zkVMs are somewhat too slow for real-time proving on a single computer. This is especially the case for Risc0 which requires dozens of GPUs.
Unfortunately, on most cloud providers scaling is significantly easier by adding more machines, or it might be the only way to scale as for example on AWS, we cannot get more than 8 GPUs per instance.
Hence we need distributed computing support on Raiko.
zkVMs are all in the process of adding continuations (Risc0, Powdr) / Sharding (SP1) / Chunking (Halo2), hence an easy way to distribute compute would be to have a master prover with a fleet of workers. The master interacts with the end-user and delegates work to workers.
GPU Cloud pricing analysis
We need 16GB or 24GB VRAM. P40 have a 2016 architecture and V100 a 2019 architecture
EC2 P3 have V100 but 16GB and more expensive so non-starter
P4d with A100 (2021) and 80GB Vram are non-starter due to price
EC2 G5 with A10G fits our need but limited to 8
Risc0
Distributing Risc0 requires sending work remotely and calling
prove_segment
on the worker then sending back the result.Note: Risc0 only supports 1 GPU per machine (GPU 0):
Once the stark proof is generated, it needs to be wrapped in Groth16 (automatically done when going through Bonsai). We should be able to use their compact_proof / stark2snark for this:
SP1
Distributing SP1 requires sending work remotely and calling
prove_shard
on the worker then sending back the result.Note: SP1 supports delegating to a "Succinct Network" with protocol defined here: https://github.com/succinctlabs/sp1/blob/5db203c/sdk/src/proto/network.rs and RPC https://github.com/succinctlabs/sp1/blob/5db203c55647c30618431822d33f419614f9fab6/sdk/src/lib.rs#L104-L157 but this seems to only delegate work, not split work.
Wrapping stark proof in snark is WIP.
Maintenance considerations
We will likely need:
prove_shard
andprove_segment
prove_shard_on_worker
andprove_segment_on_worker
Maintenance to sync with upstream should be minimized.
cc @Champii @petarvujovic98 @CeciliaZ030