zkmopro / mopro

Making client-side proving on mobile simple.
Apache License 2.0
102 stars 28 forks source link

Feat/bucket-wise metal msm on GPU #188

Closed moven0831 closed 8 hours ago

moven0831 commented 3 days ago

Summary

This PR experiments with implementing MSM in Metal Shader Language and host code in Rust to leverage computation power of GPU. Our MSM implementation on metal has performed 10x faster than lambdaworks/metal-pippenger and is the fastest open-source MSM implementation with Metal to the best of our knowledge. However, the result is still not ideal that it's much slower than Arkworks MSM on CPU.

This PR

To extract as much parallelism as possible, we modified the accumulation phase from window-wise fashion to bucket-wise fashion. To achieve this, we implement extra preparation steps before the accumulation.

In this way, the number of threads equals the number of buckets ( $\text{numWindow} \times (2^{wsize} - 1)$ ), which is significantly more than the number of threads in the window-wise fashion ( only $\text{numWindow}$ ). Thus, we extract more parallelism that is suitable for GPU computation.

benchmark result (on MacBook with M3 chip)

msm time (ms) / instance size 2^10 2^16 2^18 2^20 2^22 2^24
Arkworks msm 2.68 67.96 245.38 912.32 3641.81 14254.36
Window-wise Metal msm
(on GPU)
99.31 3845.10 x x x x
Bucket-wise Metal msm
(on GPU)
43.10 1730.26 10277.70 41019.22 555877.91 x

Future works

We've noticed that there's an approved EF grant proposal that is working on migrating ICICLE MSM from CUDA to Metal with C++ host code. Once completed, we can adapt it to Rust host code and optimize the parameters along with the BN254 curve.

cloudflare-pages[bot] commented 3 days ago

Deploying mopro with  Cloudflare Pages  Cloudflare Pages

Latest commit: bdd7797
Status: ✅  Deploy successful!
Preview URL: https://6f24f27c.mopro.pages.dev
Branch Preview URL: https://feat-metal-msm-bucket-wise.mopro.pages.dev

View logs