ed255 commented 3 months ago

Get benchmark results (time & memory) to compare the cost of proving an average block VS a block that maxes out keccaks.

For example, given a target gas of 10k (or another sensible number) do:

a. A block that consumes N gas with ERC20 transactions. The circuit capacity is configured to only cover this case.
b. A block that consumes N gas with keccaks (the highest number of keccacks you can do in 10k gas). The circuit capacity is increased from case a to accommodate all the keccacks.

For case b there are two options to max out the keccaks (See https://github.com/privacy-scaling-explorations/zkevm-specs/issues/71#issuecomment-1125612886).

Via EXTCODESIZE
Via KECCAK256

It would be great to get results for both cases.

NOTE: To get meaningful results, we need to make sure the number of keccak rows for each case is close to a power of two (so that the circuit is not underused). If we can't reach this, we could change the problem to do: given 2^k rows, do as many ops as possible to use all rows in each setting and figure out the gas spent.

The results from this will help reviewing https://github.com/ethereum/EIPs/pull/8367

ed255 commented 3 months ago

I plan to run two benchmarks. For now I discarded the case where we abuse keccaks via EXTCODESIZE because the EIP-7667 is related to "gas cost of hash functions.

Average block case

For this case we use a deployed ERC20 contract (based on openzeppelin implementation). On top of ERC20 we have added a method that does many transfers in one call, so that we can do several ERC20 transfers in a single tx, to easily fill the block with operations.

We do 4 tx * 28 tranfsers (2 tx run successful ERC20 transfers, 2 txs run failed ERC20 transfers). Gas cost is 303940

The circuit parameters required for this are:

FixedCParams {
        total_chunks: 1,
        max_rws: 70291,
        max_txs: 4,
        max_withdrawals: 0,
        max_calldata: 400,
        max_copy_rows: 25348,
        max_exp_steps: 0,
        max_bytecode: 4937,
        max_evm_rows: 200052, // max(200052, 68724)
        max_keccak_rows: 37200,
        max_vertical_circuit_rows: 0,
}

The biggest circuit ends up being the TxCircuit which needs 104471 rows per tx signature, so in total we need ~417884 rows. K = 19 log(417884, 2) = 18.67

Maxed keccak case

For this case we try to do as many KECCAK256 opcode executions as possible with the minimum amount of gas. We do this with this contract method:

    function checkKeccak256(Len calldata l) external returns (uint32 r) {
        assembly {
            let len := calldataload(4)
            let b := 77
            for {
                let i := 0
            } lt(i, len) {
                i := add(i, 1)
            } {
                mstore(0, keccak256(0, 65536)) // 32768
            }
            r := mload(0)
        }
    }

By doing 22 iterations we achieve a a similar gas as the reference case (average block) Gas cost is 308544

The circuit parameters required for this are:

FixedCParams {
        total_chunks: 1,
        max_rws: 1443749,
        max_txs: 1,
        max_withdrawals: 0,
        max_calldata: 36,
        max_copy_rows: 2883588,
        max_exp_steps: 0,
        max_bytecode: 2443,
        max_evm_rows: 200052,
        max_keccak_rows: 3210600,
        max_vertical_circuit_rows: 0,
}

The biggest circuit ends up being (as expected) the KeccakCircuit which needs 3210600 rows to prove the 22 keccaks we did of 64 KiB input each. K = 22 log(3210600, 2) = 21.61

Summary

For the entire process of generating a block proof, the witness generation time is negligible for both cases (the dominating part is proof calculation), which is only affected by k in our setting. So we will run two benchmarks for the SuperCircuit with:

K = 19 (average block case)
K = 22 (max keccak case)

The benchmarks will use the same block as input to simplify the process, but that will be irrelevant to the results.

Extra notes

Reference of worst cases analysis https://github.com/privacy-scaling-explorations/zkevm-specs/issues/71#issuecomment-1125612886

The two cases are targeting ~300k gas in order to get benchmark execution in a manageable time. Nevertheless the worst case exploit for keccack has a constant overhead (64 KiB of memory initialization), so increasing the gas would also increase the gap between average VS worst case keccack.

I expect the Maxed keccak case to take ~8x more (due to moving from circuit size 2^19 to 2^22)

ed255 commented 3 months ago

Server with 128 cores, 4TB of ram

Results k=19

https://zkevm-chain-testing.s3.eu-central-1.amazonaws.com/Super-19-Benchmark-PR1800--1712332167.tar.gz https://grafana.zkevm-testnet.org/d/vofy8DAVz/circuit-benchmarks?orgId=1&var-test_id=Super-19-Benchmark-PR1800--1712332167

Proof generation time: 4606.187s (1h 16m 46s)
Peak memory: 231.4 GiB

Result k=22

https://zkevm-chain-testing.s3.eu-central-1.amazonaws.com/Super-22-Benchmark-PR1800--1712731633.tar.gz https://grafana.zkevm-testnet.org/d/vofy8DAVz/circuit-benchmarks?orgId=1&var-test_id=Super-22-Benchmark-PR1800--1712731633

Proof generation time: 44372.936s (12h 19m 33s)
Peak memory: 1.57T TiB

privacy-scaling-explorations / zkevm-circuits

Get bench results for average block VS keccak maxed out #1798

Average block case

Maxed keccak case

Summary

Extra notes

Results k=19

Result k=22