Closed weikengchen closed 8 months ago
Here is the GPU history from Activity Monitor.
The eval check was the part in the middle that is sort of not moving.
Here is my breakdown in Metal:
segment 1 (of size 1048576)
segment 2 (smaller, of size likely 262144)
This can be compared with CPU:
segment 1
segment 2
I am actually going to close this issue since I can no longer reproduce it after I close my Chrome browser with 30+ tabs. Now, the GPU history caused by a proof generation is just this:
Highly suspect that this is just memory being used up and eval check has to invoke virtual memory.
I recently landed a change to some of the circuit generated files which will cause Metal and other compilation caches to become invalid. After a change of this nature, it's expected that the first run on Metal/CUDA will take sometimes a long time to JIT. This is likely why you can't reproduce the slow down. If you catch it again, you can confirm on a mac by looking for a MTLCompilerService
process running in the background.
Can I ask a question about the estimated performance breakdown from the RISC Zero team?
I am getting 5 : 5 : 2 on my M2 Ultra.
Is it an open problem to make the witness generation faster? Can a two-pass solution---a sequential pass and a parallel pass shine that leverages multi cores?
Yes, we are planning to make the witness generation parallelized in the near future by having a preflight step to record all the 'back' information to make parallel execution possible.
Got it. I can see that the recursion circuit already has preflight.
Bug Report
I used Apple M2 Ultra and ran the proof generation for examples/ecdsa.
I can do the proof generation in CPU without issues. But, when it comes to Metal, although I do get significant performance improvement for the Merkle tree commitments (even though it is Poseidon), the evaluation check took me 307s. This step would take my CPU about 20s to finish.
The evaluation check is this step in zkp/src/prove/prover.rs
where the Metal would be computing the quotient polynomials.
Steps to Reproduce
cargo run --features=metal
on example/ecdsaExpected behavior
Since CPU took 20s to do eval check, Metal shouldn't take this long.
Your Environment