Open han0110 opened 2 years ago
In a meeting today, @ebfull noted a specific soundness requirement for this: the prover must only be able to assign to columns involved in a challenge prior to that challenge being drawn. Once the challenge is selected, any columns that were assigned to beforehand must not be assigned to again. This places specific requirements on the shapes of gates - you couldn't assign something like: | a0 | a1 |
---|---|---|
x | y | |
z |
where $z = x + \theta . y$ for challenge $\theta$, because the value of $z$ depends on $\theta$, which in turn depends on $y$ (via a commitment to column a1
).
I suggested the following method to implement this in the type system (sadly not solely the type system, as we don't have const generic expressions yet). Currently we already allow Circuit::synthesize
to be called multiple times (in particular, floor planners require this for laying-out purposes), but we guarantee that all value closures passed to any Region::assign_*
method will be called at most once (and e.g. never in the case of assign_fixed
during proving, since it was already called during keygen). My proposal is that we'd add another type parameter to Column
, that marks what "challenge phase" it corresponds to (before any challenges, after first challenge, after second, etc.). Then if Circuit::configure
requests challenges, Circuit::synthesize
will be called repeatedly, passing in a layouter
that internally tracks which challenge phase this is. The layouter would behave in two ways:
synthesize
to be run multiple times.Circuit::synthesize
business logic could request the challenge values from a new API Layouter::get_challenge(phase / index) -> Option<Challenge>
. In challenge phase 0, all calls to this API would return None
; in phase j
calls up to Layouter::get_challenge(j - 1)
would return Some(challenge)
, while Layouter::get_challenge(j)
onwards would return None
.
In a meeting today, @ ebfull noted a specific soundness requirement for this: the prover must only be able to assign to columns involved in a challenge prior to that challenge being drawn. Once the challenge is selected, any columns that were assigned to beforehand must not be assigned to again.
Make sense! But since in Expression
we can't specify which items are input and outputs, so we can't tell if this relation is sound or not. For example, if we move the equation to be $x\ =\ z\ -\ \theta \cdot y$, then I guess it should be fine after we have committed column a1
to generate challenge $\theta$.
So using type system to specify the phase of column sounds much safer, thanks for suggestion!
I will try to add a new trait ColumnPhase
with 3 built-in implementation {First,Second,Third}Phase
, and another type parameter to Column
with default implementation as Column<C, P = FirstPhase>
, to make less changes to current code base, finally allow developers to specify which phase they want on ConstraintSystem::advice_column
. Not sure if this makes sense, but will try to see.
Also I think 3 phases are enough to build most randomized relation (shuffle, lookup, copy constraint, etc...), we can also allow more phases by cargo feature if anyone needs.
Although it's hard to tell if an Expression
is sound or not, but if we forbid any incorrect round assignment by type system, most mistakes seem not have a chance to happen.
In the office hours today, @han0110 commented that adding a type parameter to Column
required widescale changes to the codebase. To minimise this, I suggested an alternate approach, where we instead store the phase as an enum inside the Column
struct (specifically inside the Advice
and Any::Advice
structs, as @han0110 noted that both instance and fixed columns must all be assigned before any challenges, so are implicitly "first phase"). The check inside Region::assign_advice
would be very similar for type parameter vs inner field.
Making this a runtime check would add a runtime cost of at least 1 byte per Column
struct. Currently a Column
is a usize
followed by a zero-sized struct; making this struct non-zero sized could mean that after taking alignment into account, Column
might increase from 8 to 12 or 16 bytes. The effect of this on memory usage during proving should be checked.
It would be nice to have an API like
meta.challenge()
to allocate random challenge for custom gates, which guarantees the allocated challenge is squeezed after all previous allocated advice columns are committed and written into transcript.Some potential usages in our case (appliedzkp/zkevm-circuits) are:
To enable shuffle argument when we want the witness trace to be verified in a different order. One case is to shuffle the access record of RAM to another friendly order for consistency check. With such API, the custom gate would be implemented like:
Although I guess this could also be achieved by modularizing the shuffle argument used in subset argument, and provide API like
meta.shuffle()
, then the code could be much simpler like:To implement lookup argument in a different approach. Currently the native lookup argument provided by halo2 requires a shuffled table for each lookup argument to check the subset property. However, in the case of multiple input lookup to the same table, it seems plookup is more efficient, because in plookup the table only needs to appear in the mixed poly once, no matter how many input there are (although it's still bound by circuit degree, but the grand product splitting trick can help).
Another usage is to copy dynamic-size vector of witness to different regions, to avoid sacrificing capacity due to rare but worst case. One of case is to do keccak256 on dynamic size bytes, although the keccak256 is designed to be repeating the keccak-f to accept dynamic size input, but passing input becomes an issue because we can't use permutation argument for that. One approach to do is make the caller to reorder the witness to align with the one keccak256 expects, but it makes the gate complicated (need to be aware of which round it is, and maintain a buffer of bytes keccak256 expects).
So instead passing byte by byte, we'd like to try to use random linear combination
Σi(bi * θi)
to accumulate dynamic size bytes into a single witness, and use lookup (or shuffle) to make sure caller and keccak256 accumulate to same value at specific row, with its corresponding output. Then caller doesn't need to be aware of which keccak-f round it is, and only need to maintain a single witness as all bytes, and no buffer is required anymore.But there are also some issues come to my mind:
synthesize
might need to be called multiple times, which seems confusing to most users that don't need extra challenge.meta.advice_column
is called after anymeta.challenge
, but such implicit behavior might also be confusing. Another explicit API approach is to require user to specify which round of each advice column and challenge is in, but it seems too annoying for most user.Expression
it seems hard to tell if the challenge is dependency of other column's witness, so users need to be aware themselves. But it seems fine because we would find some value couldn't be calculated because it depends on future challenge, then we know we have done a wrong configuration.Also even without this API, we can still slightly modify the
create_proof
andverify_proof
in a fork ofhalo2
, to callsynthesize
multiple times and mock the challenge with a "constant instance column", thanks for the extremely flexible API that already exists! But still like to explore the possibility to allocate challenge while configuration.