This issue is a stub and will be filled out with more details/links over time. It is also more of an umbrella issue than a specific task.
Problem
Modern mobile phones come with a powerful GPU. Modern proving systems rely on MSMs which are amendable to GPU optimization. We want to use the GPU on a mobile phone to get significant proving speedups.
There are quite a lot of steps involved to get from that to actually being able to use a GPU, with a GPU-friendly proving system, to prove on a mobile phone.
Rough roadmap
We can split this up in a few different steps. Some can be done in parallel.
As of this week, we are basically or very close to here. This means we can run a non-trivial proof on a real device and see how long it takes. This is useful as a baseline.
We also have the library setup so it is easier to make incremental improvements and optimizations. This means people can focus more on the specific task ("optimizing MSMs") as opposed to figuring out everything related to running stuff on mobile from scratch. It also allows us to run things in a realistic environment (actual mobile phones) and not just on paper.
2) Tooling to enable micro benchmarks for GPU/MSMs
This step can be done in parallel with 4.
With 1 done, we can start to add the minimal tooling necessary to begin to understand GPU performance improvements. Because the scaffolding is already in place, this is not a lot of work.
What we really want is to use a GPU to make proofs on mobile, but there are a few steps to get there (see steps below). This step is about enabling us to start experiment with different approaches to MSMs.
Why MSMs? It is a primitive operation that is (i) used in a lot of modern proof systems (ii) is amendable to GPU optimizations.
This step is also useful for individuals who want to focus on specific micro optimizations, and want to understand impact on realistic scenarios on mobile. Related to this is this issue to run CI on iOS (or Android, once supported. That means a developer don't need to setup a mobile environment but can see benchmark results in CI. This will require additional work, especially to run on a real device.
3) Improve MSM performance using GPU
This step can be done in parallel with step 4.
This part is a stub, can be filled in with more details
Once we have tooling for micro benchmarks above, we can start to optimize it. This includes comparing with ark-msm on mobile phones and integrating other Zprize work. Suggested rough path:
a) Naively include ark-msm and compare benchmarks on real device with baseline
b) Reproduce using laptop/server GPU (similar to 4b below)
c) Experiment and understand how to use GPU on iOS at all (Metals tutorials)
d) Combine above and get GPU to fire on iOS and get benchmarks for it
e) Any other kind of optimization comes last (see 5 below)
4) Integrate a GPU-friendly proving system
This step can be done in parallel with steps 2 and 3.
Right now we have support for Circom via Arkworks. There are plans to support additional proving systems. There's a modular framework in place to make it easy to support additional proving systems. Relevant for our purposes are proving systems that are amendable to GPU-optimizations and use a lot of MSMs, such as Nova, various other folding schemes and Spartan (?). Assuming there's a Rust implementation, it is generally not too difficult to add new proving systems, but it obviously requires a bit of work and there are some details involved (such as specific mobile restrictions).
4a) As a first step here, we want to just integrate the proving system and get a baseline, similar to what is mentioned in 1 above.
4b) The next step after that is to get it to run on GPU at all (with visible improvements), including on laptop. Depending on the code base and its setup, this may or may not be straightforward. For example, even though Nova has support for CUDA and OpenCL, OP and others didn't manage to get the GPU to fire in a timeboxed manner both on M1 and powerful GPU server.
Why laptop/server first? While not strictly necessary, it is an easier easier problem that is on the right path. It is a lot easier to experiment, debug and get insight into what's going on in a laptop/server than on an iPhone. There has also historically been more attention to it, so prior work is more likely. If we can't get GPU to fire and see performance benefits on a desktop, it is unlikely we'll get it to work and see any on a mobile phone. It is also a useful baseline.
4c) Get iOS/mobile GPU to fire and show performance benefits of it on a real device. This includes understanding things like Apple GPU Metal framework. This might be easier to do as a part of step 2/3, as it is likely primitive operations like MSM (or similar) have been made to run on iPhone as part of the gaming/AI ecosystem.
5) Other stuff
There are a lot of other things that can be done in terms of theoretical improvements, optimizations, mobile specific work etc. Some of these can be researched in parallel. But I believe these things are not on the critical path until steps above have been done.
This issue is a stub and will be filled out with more details/links over time. It is also more of an umbrella issue than a specific task.
Problem
Modern mobile phones come with a powerful GPU. Modern proving systems rely on MSMs which are amendable to GPU optimization. We want to use the GPU on a mobile phone to get significant proving speedups.
There are quite a lot of steps involved to get from that to actually being able to use a GPU, with a GPU-friendly proving system, to prove on a mobile phone.
Rough roadmap
We can split this up in a few different steps. Some can be done in parallel.
1) Baseline and PoC working (done)
Before you can make something fast, you have to make it work at all in the right way. Make it work, make it right, make it fast.
As of this week, we are basically or very close to here. This means we can run a non-trivial proof on a real device and see how long it takes. This is useful as a baseline.
We also have the library setup so it is easier to make incremental improvements and optimizations. This means people can focus more on the specific task ("optimizing MSMs") as opposed to figuring out everything related to running stuff on mobile from scratch. It also allows us to run things in a realistic environment (actual mobile phones) and not just on paper.
2) Tooling to enable micro benchmarks for GPU/MSMs
This step can be done in parallel with 4.
With 1 done, we can start to add the minimal tooling necessary to begin to understand GPU performance improvements. Because the scaffolding is already in place, this is not a lot of work.
What we really want is to use a GPU to make proofs on mobile, but there are a few steps to get there (see steps below). This step is about enabling us to start experiment with different approaches to MSMs.
Why MSMs? It is a primitive operation that is (i) used in a lot of modern proof systems (ii) is amendable to GPU optimizations.
See Implement micro benchmarking tooling for doing MSMs/using GPU on Mobile for details on this step.
This step is also useful for individuals who want to focus on specific micro optimizations, and want to understand impact on realistic scenarios on mobile. Related to this is this issue to run CI on iOS (or Android, once supported. That means a developer don't need to setup a mobile environment but can see benchmark results in CI. This will require additional work, especially to run on a real device.
3) Improve MSM performance using GPU
This step can be done in parallel with step 4.
This part is a stub, can be filled in with more details
Once we have tooling for micro benchmarks above, we can start to optimize it. This includes comparing with ark-msm on mobile phones and integrating other Zprize work. Suggested rough path:
a) Naively include ark-msm and compare benchmarks on real device with baseline b) Reproduce using laptop/server GPU (similar to 4b below) c) Experiment and understand how to use GPU on iOS at all (Metals tutorials) d) Combine above and get GPU to fire on iOS and get benchmarks for it e) Any other kind of optimization comes last (see 5 below)
4) Integrate a GPU-friendly proving system
This step can be done in parallel with steps 2 and 3.
Right now we have support for Circom via Arkworks. There are plans to support additional proving systems. There's a modular framework in place to make it easy to support additional proving systems. Relevant for our purposes are proving systems that are amendable to GPU-optimizations and use a lot of MSMs, such as Nova, various other folding schemes and Spartan (?). Assuming there's a Rust implementation, it is generally not too difficult to add new proving systems, but it obviously requires a bit of work and there are some details involved (such as specific mobile restrictions).
4a) As a first step here, we want to just integrate the proving system and get a baseline, similar to what is mentioned in 1 above.
4b) The next step after that is to get it to run on GPU at all (with visible improvements), including on laptop. Depending on the code base and its setup, this may or may not be straightforward. For example, even though Nova has support for CUDA and OpenCL, OP and others didn't manage to get the GPU to fire in a timeboxed manner both on M1 and powerful GPU server.
Why laptop/server first? While not strictly necessary, it is an easier easier problem that is on the right path. It is a lot easier to experiment, debug and get insight into what's going on in a laptop/server than on an iPhone. There has also historically been more attention to it, so prior work is more likely. If we can't get GPU to fire and see performance benefits on a desktop, it is unlikely we'll get it to work and see any on a mobile phone. It is also a useful baseline.
4c) Get iOS/mobile GPU to fire and show performance benefits of it on a real device. This includes understanding things like Apple GPU Metal framework. This might be easier to do as a part of step 2/3, as it is likely primitive operations like MSM (or similar) have been made to run on iPhone as part of the gaming/AI ecosystem.
5) Other stuff
There are a lot of other things that can be done in terms of theoretical improvements, optimizations, mobile specific work etc. Some of these can be researched in parallel. But I believe these things are not on the critical path until steps above have been done.