This PR experiments with accelerating Arkworks MSM on the CPU using precomputation techniques. The result shows that while window-wise MSM with precomputation provides a slight speedup, the increased I/O and serialize/deserialize overhead are not worth it.
Since the goal is to speed up MSM computation, we tried both GPU and CPU ways to accelerate it. The GPU way is recorded in this issue https://github.com/zkmopro/mopro/issues/153.
This PR
Implement the generation for precomputed point and related serde methods. Adapted the Arkworks MSM algorithm to use the precomputed points.
benchmark result (on MacBook with M3 chip)
similar to Arkworks MSM, sometimes are slightly faster but not significant
msm time (ms) / instance size
2^10
2^16
2^18
2^20
2^22
2^24
Arkworks msm
2.68
67.96
245.38
912.32
3641.81
14254.36
Window-wise MSM with precomputation (on CPU)
2.13
65.83
234.32
915.91
x
x
The relationship among acceleration speed, size of precomputation points and time for precomputation is as follow:
2^10: precompute_factor=20 (i.e. precomputed point size increase 20 times)
2^16: precompute_factor=20
2^18: precompute_factor=19
2^20: precompute_factor=17 (Size of precomputed points is around 570 MB)
Summary
This PR experiments with accelerating Arkworks MSM on the CPU using precomputation techniques. The result shows that while window-wise MSM with precomputation provides a slight speedup, the increased I/O and serialize/deserialize overhead are not worth it.
Check on this report for more detail: https://hackmd.io/v8WPAG8-RsCANrbv_seSXw
Status
This PR
Implement the generation for precomputed point and related serde methods. Adapted the Arkworks MSM algorithm to use the precomputed points.
benchmark result (on MacBook with M3 chip)
(on CPU)
The relationship among acceleration speed, size of precomputation points and time for precomputation is as follow: