taikoxyz / raiko

Multi-proofs for Taiko. SNARKS, STARKS and Trusted Execution Enclave. Our previous ZK-EVM circuits are deprecated.
Apache License 2.0
96 stars 75 forks source link

Proof of Equivalence in ZkVM with Fiat inputs #292

Open CeciliaZ030 opened 1 week ago

CeciliaZ030 commented 1 week ago

Background

We have to prove that Rollup's data availability is from a given blob with KZG commitment

Screen Shot 2024-06-17 at 12 31 10 AM

This means that the ZkVM guest has to go through the computation that generates the BlobHash:

// Example in Sp1
let kzg_setting = sp1_zkvm::io::read::<KZGSettings>();
let blob = sp1_zkvm::io::read::<Blob>();
blob_to_kzg_commitment_rust(&blob, &kzg_setting).unwrap();

However this function blob_to_kzg_commitment_rust which

  1. convert the blob bytes to field elements and use them as polynomial coefficients;
  2. perform MSM with coefficients against group element G1;

The group operation is too costly, and even compiling kzg libraries into RiscV is challenging due to C bindings.

Data

It's impractical to run any of the KZG library with ZkVM, even when it compiles it takes millions of cycles. c-kzg in Risc0 spends 200-300M, while rust-kzg benchmarks as following with zkcrypto backend in Sp1:

2024-06-12T20:25:45.908283Z DEBUG execute: ┌╴load kzg_setting    
2024-06-12T20:25:46.496903Z  INFO execute: └╴7,783,656 cycles    
2024-06-12T20:25:46.496969Z DEBUG execute: ┌╴read blob    
2024-06-12T20:25:46.557434Z  INFO execute: └╴815,690 cycles    
2024-06-12T20:25:46.557497Z DEBUG execute: ┌╴blob_to_kzg_commitment_rust      
2024-06-12T20:34:17.902761Z  INFO execute: └╴7,692,653,487 cycles 
2024-06-12T20:34:17.903111Z  INFO execute: finished execution clk = 7701259597 pc = 0x0

Solution: Simpilified Proof of Equivalence with Fiat Input

Aligned Version

https://notes.ethereum.org/@dankrad/kzg_commitments_in_proofs

Screen Shot 2024-06-17 at 12 46 38 AM

Non-aligned Version with Fiat Input

While our situation falls into the case of non-aligned fields, because there's no way to prove the same y = f(x) in ZkVM's runtime trace and underlying commitment scheme. Instead we can use a simplified variant of the "Fiat Input Method" proposed by @dankrad. In short, his method assumes a ZkVM field >> Bls381 filed, and he encodes the blob elements in the ZkVM commitment polynomial g(x) with g(x0) = b0, g(x1) = b1, ..., g(x4096) = b4096 (note that there are 4096 field elements per blob). Then use the ZkVM proof to show that g(x) is valid with [b1, ..., b4096] and the commitment E as public input. With some tweaks, The scheme let the verifier compute that g(x) evaluates to 0 everywhere else than the first 4096 roots of unity.

Total cost: Inside the circuit, we now only have to evaluate the polynomial, i.e. 8192 non-aligned field operations. No hashes are necessary (except for a single one for computing the random point x ). This method has a big advantage if hashes are expensive inside the circuit. This will mostly be the case if for security reasons, no arithmetic hash functions should be used; if arithmetic hash functions can be used, the hashing cost is probably similar to the evaluation cost and the advantage of this technique not worth the complication

Non-aligned Simplification with ZkVM

His method predates the actual presence of ZkVM, so it's operating directly with Plonk's public input commitment. However in ZkVM we don't have access to the public input commitment but we we can achieve the same purpose with simplification:

Screen Shot 2024-06-17 at 3 34 48 AM

We argue that it's a variation of the Fiat Input method because it has similar cost evaluation: 1) non-aligned field operations 2) one single hash. The benchmark is the following:

2024-06-14T05:10:41.913288Z DEBUG execute: ┌╴load kzg_setting    
2024-06-14T05:10:42.544001Z  INFO execute: └╴8,362,568 cycles    
2024-06-14T05:10:42.544066Z DEBUG execute: ┌╴read blob    
2024-06-14T05:10:42.584997Z  INFO execute: └╴590,548 cycles    
2024-06-14T05:10:42.585058Z DEBUG execute: ┌╴deserialize_blob_rust    
2024-06-14T05:10:43.087086Z  INFO execute: └╴7,710,650 cycles     
2024-06-14T05:10:43.141175Z DEBUG execute: ┌╴PolyData::from_coeffs    
2024-06-14T05:10:43.153636Z  INFO execute: └╴180,679 cycles    
2024-06-14T05:10:43.153678Z DEBUG execute: ┌╴tiny-keccak    
2024-06-14T05:10:43.227450Z  INFO execute: └╴1,070,848 cycles    
2024-06-14T05:10:43.227493Z DEBUG execute: ┌╴hash_to_bls_field    
2024-06-14T05:10:43.227633Z  INFO execute: └╴1,906 cycles    
2024-06-14T05:10:43.227668Z DEBUG execute: ┌╴evaluate_polynomial_in_evaluation_form     
2024-06-14T05:10:45.319802Z  INFO execute: └╴31,755,132 cycles  
2024-06-14T05:10:45.320158Z  INFO execute: finished execution clk = 50496742 pc = 0x0

Implimentation

Spam policy

Brechtpd commented 1 week ago

I still think x=hash(C1,C2), which is used in the formula's of both dankrad and vitalik (and what we also did for the PSE circuits: https://github.com/taikoxyz/zkevm-circuits/issues/37#issuecomment-1335969679 and checked by Aleksei/Mario for correctness).

His method predates the actual presence of ZkVM, so it's operating directly with Plonk's public input commitment. However in ZkVM we don't have access to the public input commitment but we we can achieve the same purpose with simplification:

Looking at evaluate_polynomial_in_evaluation_form: https://github.com/grandinetech/rust-kzg/blob/1c9f4435a32f5ad7ffb73e299a33b212340fe86b/mcl/kzg/src/eip_4844.rs#L243. It does seem to use the barycentric formula, so I think this is the most efficient we can get it and also the one we did for the PSE circuits, so should be good!

(note that the point evaluation precompile will still be disable in L2 due to high cost)

Do you know the number of cycles of the point precompile for c-kzg vs rust-kzg? We only have to do point precompile calculations on the revm side.

We may also want to use sha256 instead of keccak for the hashing of the data, if it saves some cycles (sha256 should be a bit more efficient, but we'll have to see about that I guess).

CeciliaZ030 commented 1 week ago

I still think x=hash(C1,C2), which is used in the formula's of both dankrad and vitalik (and what we also did for the PSE circuits: taikoxyz/zkevm-circuits#37 (comment) and checked by Aleksei/Mario for correctness).

It's hard to obtain C1 tho, in the ZkVM case C1 is the implicit commit that the proof system is using to commit ALL memory. It's not like in Halo2 u can init a column and force it to be zero in all unused rows. The ZkVM guest code literally init blob: Vec<u8>, and this memory allocation is written somewhere of the STARK circuit along with other memory traces, how do u get the C1 which commit all these?

Do you know the number of cycles of the point precompile for c-kzg vs rust-kzg? We only have to do point precompile calculations on the revm side.

Well @smtmfft says 200-300M for c-kzg in Risc0 but i can't get it compiled in Sp1 so idk.

We may also want to use sha256 instead of keccak for the hashing of the data, if it saves some cycles (sha256 should be a bit more efficient, but we'll have to see about that I guess).

We should use which ever that's the fastest, sp1 has both sha256 and keccak so we have see cycles and the actually time of those syscall.

Brechtpd commented 1 week ago

I still think x=hash(C1,C2), which is used in the formula's of both dankrad and vitalik (and what we also did for the PSE circuits: taikoxyz/zkevm-circuits#37 (comment) and checked by Aleksei/Mario for correctness).

It's hard to obtain C1 tho, in the ZkVM case C1 is the implicit commit that the proof system is using to commit ALL memory. It's not like in Halo2 u can init a column and force it to be zero in all unused rows. The ZkVM guest code literally init blob: Vec<u8>, and this memory allocation is written somewhere of the STARK circuit along with other memory traces, how do u get the C1 which commit all these?

C1 = keccak/sha256 hash of the blobdata C2 = kzg commitment of the blobdata (free from ethereum)

No need to use any built in commitment of the blobdata inside the zkVM circuit itself, the hash is also a commitment we can just calculate inside the VM.

Do you know the number of cycles of the point precompile for c-kzg vs rust-kzg? We only have to do point precompile calculations on the revm side.

Well @smtmfft says 200-300M for c-kzg in Risc0 but i can't get it compiled in Sp1 so idk.

That's for the calculation of the kzg commitment as far as I know, not the point precompile.

We may also want to use sha256 instead of keccak for the hashing of the data, if it saves some cycles (sha256 should be a bit more efficient, but we'll have to see about that I guess).

We should use which ever that's the fastest, sp1 has both sha256 and keccak so we have see cycles and the actually time of those syscall.

sha256 you would expect to be faster, but do indeed need to measure it to make sure.

CeciliaZ030 commented 1 week ago

👌

Screen Shot 2024-06-17 at 11 31 04 PM
dankrad commented 1 week ago

C1 = keccak/sha256 hash of the blobdata

Yes, you can do this if it's cheap enough to evaluate keccak/sha256 inside your proof system.

It's not safe to leave out C1: it's possible to find another polynomial f'(x) that evaluates to f'(x)=f(x)=y and that would pass the check.

CeciliaZ030 commented 1 week ago

Yes, you can do this if it's cheap enough to evaluate keccak/sha256 inside your proof system.

It's not safe to leave out C1: it's possible to find another polynomial f'(x) that evaluates to f'(x)=f(x)=y and that would pass the check.

Thanks for replying @dankrad ! In this case, i guess we're replacing a poly commitment of blob (i.e. f(x)) with a hash commitment, so the security assumption comes from the hash function, then the malleability of f(x) is out of picture