scroll-tech / ceno

Accelerate Zero-knowledge Virtual Machine by Non-uniform Prover Based on GKR Protocol
Apache License 2.0
21 stars 2 forks source link

fix: disable unintended parallel processing in sumcheck stage 1 #530

Closed hero78119 closed 1 day ago

hero78119 commented 3 days ago

Fixed #511

Root cause

We have 2 stage sumcheck

So in stage 1, a rayon job shouldn't invoke parallel rayon internally, otherwise it will invoke deadlock, since there is 0 rayon idle thread in pool, so the parallel job will hang and never generate 1st round of univariate evaluation, so the main worker thread never get enough message to processing.

This only happened when extrapolate was invoked on the situation when virtual polynomial got different degree monomial terms. In other word, it tend to happened on opcodes with different degree of monomial terms in a constrain.

The fix is we need to have stage 1 strictly serialized and do not invoke rayon parallel internally.

Another minor optimisation

Previously main thread also be one of worker thread, thus it also exchange data via channel which might incur a unnecessary cost. This PR also remove this cost and exchange data locally.

Testing

matthiasgoergens commented 1 day ago

Thanks for the detailed PR description!

hero78119 commented 1 day ago

Hey @kunxian-xia originally I planned to merge this to master, but I believe you want to narrow down integration test debug scope in this period, so you might avoid having regular sync with master to your branch, also cherry pick might also troublesome.

how about we merge this PR to integration test feature branch instead? I'm fine with this plan :)

kunxian-xia commented 1 day ago

how about we merge this PR to integration test feature branch instead? I'm fine with this plan :)

Sounds good to me!

hero78119 commented 1 day ago

how about we merge this PR to integration test feature branch instead? I'm fine with this plan :)

Sounds good to me!

Ok done 👍

naure commented 11 hours ago

Could we have this in master as well?