Closed hero78119 closed 1 day ago
Thanks for the detailed PR description!
Hey @kunxian-xia originally I planned to merge this to master, but I believe you want to narrow down integration test debug scope in this period, so you might avoid having regular sync with master to your branch, also cherry pick might also troublesome.
how about we merge this PR to integration test feature branch instead? I'm fine with this plan :)
how about we merge this PR to integration test feature branch instead? I'm fine with this plan :)
Sounds good to me!
how about we merge this PR to integration test feature branch instead? I'm fine with this plan :)
Sounds good to me!
Ok done 👍
Could we have this in master as well?
Fixed #511
Root cause
We have 2 stage sumcheck
So in stage 1, a rayon job shouldn't invoke parallel rayon internally, otherwise it will invoke deadlock, since there is 0 rayon idle thread in pool, so the parallel job will hang and never generate 1st round of univariate evaluation, so the main worker thread never get enough message to processing.
This only happened when
extrapolate
was invoked on the situation when virtual polynomial got different degree monomial terms. In other word, it tend to happened on opcodes with different degree of monomial terms in a constrain.The fix is we need to have stage 1 strictly serialized and do not invoke rayon parallel internally.
Another minor optimisation
Previously main thread also be one of worker thread, thus it also exchange data via channel which might incur a unnecessary cost. This PR also remove this cost and exchange data locally.
Testing
ceno-server
, previously with commandRAYON_NUM_THREADS=64 RUST_LOG=debug cargo run --release --example fibonacci_elf -- --nocapture
it's 100% reproduce