"Some Improvements for the PIOP for ZeroCheck" paper follow up

hero78119 commented 3 weeks ago

Quick hacky try on Some Improvements for the PIOP for ZeroCheck with 2 tricks

3.1 Sending less data => skip send univariate u(1) as it can be derived from verifier
3.2 Slight modification of the Protocol => exploit "eq" for its special form. In general, it can be derived as "common factor" into the protocol, so the overall degree will be minus one

for above 2 tricks can be easily verified as "low-hanging fruit"

This is a workaround commit to evaluate potiential gain on Ceno zkVM end-to-end test https://github.com/scroll-tech/ceno/commit/02983a3d57b5e7e85acdae6e18ba7feba02117d5, idea of these changes are to simulate

exclude one poly from sumcheck to simulate eq exploited

eq still need to perform fix_variable alike operation to derive "common factor"
send 1 less point u(1)

Preliminary Benchmark result

2^20 add instances ~= -7.07% throughput improvement
fabonacci assemply toy example < 10% throughput improvement
fabonacci e2e => TBD

(Preliminary) Conclusion

Overall improvement percentage depends on how much proportion sumcheck protocol involve into critical path. We can expect it benefit on tower sumcheck more, as it can reduce logup GKR sumcheck degree from 3 ((eq f1 g2)) -> 2 (f1 * g2). Tricks 3.1, 3.2 looks like probably improve around 5~10%, which is much less than my expectation (like > 30% ) It will be worth to dive in more, and probably further breakdown by flamegraph and see what could be the bottleneck.

hero78119 commented 3 weeks ago

Hey @huwenqing0606 Wenqing, I remember you have a analysing note on theoretical gain (with link https://www.overleaf.com/project/66cf8ab8ea23adb139a41e56 but I dont have permission). Do you expect 3.1, .3.2 sufficient to bring significant gain? or do you think "Sec 4. Algebraic Improvements" is the most critical improvement so we might seek to further involving?

hero78119 commented 3 weeks ago

Just an updated: try more extreme case https://github.com/scroll-tech/ceno/commit/01d31e5c8a9bfb1eb205ee96c00070aee53d4704

skip "ALL" univariate computation
still keep fix variable

Then the overall throughput improvement is just -15.568% for 2^20 add instances

So it indicate sumcheck degree complexity might not play large proportion on critical path latency, and we can identify others, probably more implementation improvement to have more huge gain. And come back to this problem once it become bottleneck

scroll-tech / ceno

"Some Improvements for the PIOP for ZeroCheck" paper follow up #537

Preliminary Benchmark result

(Preliminary) Conclusion