zcash / halo2

The Halo2 zero-knowledge proving system
https://zcash.github.io/halo2/
Other
739 stars 496 forks source link

Potential usages if `halo2` has `meta.challenge()` #527

Open han0110 opened 2 years ago

han0110 commented 2 years ago

It would be nice to have an API like meta.challenge() to allocate random challenge for custom gates, which guarantees the allocated challenge is squeezed after all previous allocated advice columns are committed and written into transcript.

Some potential usages in our case (appliedzkp/zkevm-circuits) are:

  1. To enable shuffle argument when we want the witness trace to be verified in a different order. One case is to shuffle the access record of RAM to another friendly order for consistency check. With such API, the custom gate would be implemented like:

    let [q_shuffle, q_first, q_last] = [(); 3].map(|_| meta.selector());
    let [a0, a1, b0, b1] = [(); 4].map(|_| meta.advice_column());
    let [theta, beta] = [(); 2].map(|_| meta.challenge());
    let z = meta.advice_column();
    
    meta.create_gate("Shuffle (a0, a1) into (b0, b1)", |meta| {
        let [q_shuffle, q_first, q_last]  = [q_shuffle, q_first, q_last].map(|q| meta.query_selector(q));
        let [a0, a1, b0, b1] = [a0, a1, b0, b1].map(|a| meta.query_advice(a, Rotation::cur()));
        let [theta, beta] = [theta, beta].map(|challenge| meta.query_challenge(challenge));
        let [z, zw] = [Rotation::cur(), Rotation::next()].map(|rotation| meta.query_advice(z, rotation));
        let one = Expression::Constant(F::one());
    
        // compress
        let a = a0 * theta + a1;
        let b = b0 * theta + b1;
    
        vec![
            q_shuffle * (z * (a + beta) - zw * (b + beta)),
            q_first * (one - z),
            q_last * (one - z),
        ]
    });

    Although I guess this could also be achieved by modularizing the shuffle argument used in subset argument, and provide API like meta.shuffle(), then the code could be much simpler like:

    let q_shuffle = meta.selector();
    let [a0, a1, b0, b1] = [(); 4].map(|_| meta.advice_column());
    
    meta.shuffle(|meta| {
        let q_shuffle  = meta.query_selector(q_shuffle);
        let [a0, a1, b0, b1] = [a0, a1, b0, b1].map(|a| meta.query_advice(a, Rotation::cur()));
    
        [(a0, b0), (a1, b1)].map(|(lhs, rhs)| (q_shuffle * lhs, q_shuffle * rhs))
    });
  2. To implement lookup argument in a different approach. Currently the native lookup argument provided by halo2 requires a shuffled table for each lookup argument to check the subset property. However, in the case of multiple input lookup to the same table, it seems plookup is more efficient, because in plookup the table only needs to appear in the mixed poly once, no matter how many input there are (although it's still bound by circuit degree, but the grand product splitting trick can help).

  3. Another usage is to copy dynamic-size vector of witness to different regions, to avoid sacrificing capacity due to rare but worst case. One of case is to do keccak256 on dynamic size bytes, although the keccak256 is designed to be repeating the keccak-f to accept dynamic size input, but passing input becomes an issue because we can't use permutation argument for that. One approach to do is make the caller to reorder the witness to align with the one keccak256 expects, but it makes the gate complicated (need to be aware of which round it is, and maintain a buffer of bytes keccak256 expects).

    So instead passing byte by byte, we'd like to try to use random linear combination Σi(bi * θi) to accumulate dynamic size bytes into a single witness, and use lookup (or shuffle) to make sure caller and keccak256 accumulate to same value at specific row, with its corresponding output. Then caller doesn't need to be aware of which keccak-f round it is, and only need to maintain a single witness as all bytes, and no buffer is required anymore.

But there are also some issues come to my mind:

  1. Because some witness value would depend on challenge, synthesize might need to be called multiple times, which seems confusing to most users that don't need extra challenge.
  2. In above pseudo code, I assume an extra round is implicitly added when meta.advice_column is called after any meta.challenge, but such implicit behavior might also be confusing. Another explicit API approach is to require user to specify which round of each advice column and challenge is in, but it seems too annoying for most user.
  3. From Expression it seems hard to tell if the challenge is dependency of other column's witness, so users need to be aware themselves. But it seems fine because we would find some value couldn't be calculated because it depends on future challenge, then we know we have done a wrong configuration.

Also even without this API, we can still slightly modify the create_proof and verify_proof in a fork of halo2, to call synthesize multiple times and mock the challenge with a "constant instance column", thanks for the extremely flexible API that already exists! But still like to explore the possibility to allocate challenge while configuration.

str4d commented 2 years ago
In a meeting today, @ebfull noted a specific soundness requirement for this: the prover must only be able to assign to columns involved in a challenge prior to that challenge being drawn. Once the challenge is selected, any columns that were assigned to beforehand must not be assigned to again. This places specific requirements on the shapes of gates - you couldn't assign something like: a0 a1
x y
z

where $z = x + \theta . y$ for challenge $\theta$, because the value of $z$ depends on $\theta$, which in turn depends on $y$ (via a commitment to column a1).

I suggested the following method to implement this in the type system (sadly not solely the type system, as we don't have const generic expressions yet). Currently we already allow Circuit::synthesize to be called multiple times (in particular, floor planners require this for laying-out purposes), but we guarantee that all value closures passed to any Region::assign_* method will be called at most once (and e.g. never in the case of assign_fixed during proving, since it was already called during keygen). My proposal is that we'd add another type parameter to Column, that marks what "challenge phase" it corresponds to (before any challenges, after first challenge, after second, etc.). Then if Circuit::configure requests challenges, Circuit::synthesize will be called repeatedly, passing in a layouter that internally tracks which challenge phase this is. The layouter would behave in two ways:

han0110 commented 2 years ago

In a meeting today, @ ebfull noted a specific soundness requirement for this: the prover must only be able to assign to columns involved in a challenge prior to that challenge being drawn. Once the challenge is selected, any columns that were assigned to beforehand must not be assigned to again.

Make sense! But since in Expression we can't specify which items are input and outputs, so we can't tell if this relation is sound or not. For example, if we move the equation to be $x\ =\ z\ -\ \theta \cdot y$, then I guess it should be fine after we have committed column a1 to generate challenge $\theta$.

So using type system to specify the phase of column sounds much safer, thanks for suggestion!

I will try to add a new trait ColumnPhase with 3 built-in implementation {First,Second,Third}Phase, and another type parameter to Column with default implementation as Column<C, P = FirstPhase>, to make less changes to current code base, finally allow developers to specify which phase they want on ConstraintSystem::advice_column. Not sure if this makes sense, but will try to see.

Also I think 3 phases are enough to build most randomized relation (shuffle, lookup, copy constraint, etc...), we can also allow more phases by cargo feature if anyone needs.

Although it's hard to tell if an Expression is sound or not, but if we forbid any incorrect round assignment by type system, most mistakes seem not have a chance to happen.

str4d commented 2 years ago

In the office hours today, @han0110 commented that adding a type parameter to Column required widescale changes to the codebase. To minimise this, I suggested an alternate approach, where we instead store the phase as an enum inside the Column struct (specifically inside the Advice and Any::Advice structs, as @han0110 noted that both instance and fixed columns must all be assigned before any challenges, so are implicitly "first phase"). The check inside Region::assign_advice would be very similar for type parameter vs inner field.

Making this a runtime check would add a runtime cost of at least 1 byte per Column struct. Currently a Column is a usize followed by a zero-sized struct; making this struct non-zero sized could mean that after taking alignment into account, Column might increase from 8 to 12 or 16 bytes. The effect of this on memory usage during proving should be checked.