risc0 / risc0

RISC Zero is a zero-knowledge verifiable general computing platform based on zk-STARKs and the RISC-V microarchitecture.
https://risczero.com
Apache License 2.0
1.63k stars 410 forks source link

[BUG] Metal proof generation seems to get stuck for quite a while in `eval_check` #1310

Closed weikengchen closed 8 months ago

weikengchen commented 8 months ago

Bug Report

I used Apple M2 Ultra and ran the proof generation for examples/ecdsa.

I can do the proof generation in CPU without issues. But, when it comes to Metal, although I do get significant performance improvement for the Merkle tree commitments (even though it is Poseidon), the evaluation check took me 307s. This step would take my CPU about 20s to finish.

The evaluation check is this step in zkp/src/prove/prover.rs

        let groups: Vec<&_> = self
            .groups
            .iter()
            .map(|pg| &pg.as_ref().unwrap().evaluated)
            .collect();
        circuit_hal.eval_check(
            &check_poly,
            groups.as_slice(),
            globals,
            poly_mix,
            self.po2,
            self.cycles,
        );

where the Metal would be computing the quotient polynomials.

Steps to Reproduce

cargo run --features=metal on example/ecdsa

Expected behavior

Since CPU took 20s to do eval check, Metal shouldn't take this long.

Your Environment

weikengchen commented 8 months ago

Here is the GPU history from Activity Monitor.

Screenshot 2024-01-10 at 16 47 01

The eval check was the part in the middle that is sort of not moving.

Here is my breakdown in Metal:

segment 1 (of size 1048576)

segment 2 (smaller, of size likely 262144)

This can be compared with CPU:

segment 1

segment 2

weikengchen commented 8 months ago

I am actually going to close this issue since I can no longer reproduce it after I close my Chrome browser with 30+ tabs. Now, the GPU history caused by a proof generation is just this:

Screenshot 2024-01-10 at 17 06 36

Highly suspect that this is just memory being used up and eval check has to invoke virtual memory.

flaub commented 8 months ago

I recently landed a change to some of the circuit generated files which will cause Metal and other compilation caches to become invalid. After a change of this nature, it's expected that the first run on Metal/CUDA will take sometimes a long time to JIT. This is likely why you can't reproduce the slow down. If you catch it again, you can confirm on a mac by looking for a MTLCompilerService process running in the background.

weikengchen commented 8 months ago

Can I ask a question about the estimated performance breakdown from the RISC Zero team?

I am getting 5 : 5 : 2 on my M2 Ultra.

Is it an open problem to make the witness generation faster? Can a two-pass solution---a sequential pass and a parallel pass shine that leverages multi cores?

flaub commented 8 months ago

Yes, we are planning to make the witness generation parallelized in the near future by having a preflight step to record all the 'back' information to make parallel execution possible.

weikengchen commented 8 months ago

Got it. I can see that the recursion circuit already has preflight.