optimize constraints - Githubissues

This note seeks to explain what constraint optimization is, what noname does to optimize constraints, and what it could do.

[!NOTE] The best way to understand if we're missing optimizations, and the kind of optimizations we can achieve, is to write noname code and manually check if the compiler creates constraints that could be further optimized.

What noname does at the moment

We handle Vars in noname, that are supposed to reflect the variables that noname code creates (which can represent Field, Bool, but also custom user structs):

pub struct Var<F> where F: Field {
    /// The type of variable.
    pub cvars: Vec<ConstOrCell<F>>,

    /// The span that created the variable.
    pub span: Span,
}

Internally, a Var is simply a vector of CellVars, which is the building block of (and is produced by) a constraint system. Although, the circuit writer do need to differentiate between constants and CellVars and so we do have one additional layer in between:

pub enum ConstOrCell<F> where F: Field {
    /// A constant value.
    #[serde(skip)]
    Const(F),

    /// A cell in the execution trace.
    Cell(CellVar),
}

CellVars have a common interface between constraint system backends at the moment, and are issued through some of its functions. In detail, the functions issuing such internal variables are:

Backend::new_internal_var(&mut self, val: Value<Self>, span: Span) -> CellVar
builtins like Backend::poseidon() (often implemented as custom gates in plonk) that are meant to be provided by the constraint system.

[!NOTE] We could provide an opaque CellVar type through the backend (i.e. Backend::CellVar), but it doesn't seem that useful at the moment.

Functions that operate on Vars (like add and mul) will prompt the creation of new variables and constraints.

Take some time to look at the addition of variables in noname:

pub fn add<B: Backend>(
    compiler: &mut CircuitWriter<B>,
    lhs: &ConstOrCell<B::Field>,
    rhs: &ConstOrCell<B::Field>,
    span: Span,
) -> Var<B::Field> {
    let zero = B::Field::zero();
    let one = B::Field::one();

    match (lhs, rhs) {
        // 2 constants
        (ConstOrCell::Const(lhs), ConstOrCell::Const(rhs)) => Var::new_constant(*lhs + *rhs, span),

        // const and a var
        (ConstOrCell::Const(cst), ConstOrCell::Cell(cvar))
        | (ConstOrCell::Cell(cvar), ConstOrCell::Const(cst)) => {
            // if the constant is zero, we can ignore this gate
            if cst.is_zero() {
                // TODO: that span is incorrect, it should come from lhs or rhs...
                return Var::new_var(*cvar, span);
            }

            // create a new variable to store the result
            let res =
                compiler.backend.new_internal_var(Value::LinearCombination(vec![(one, *cvar)], *cst), span);

            // create a gate to store the result
            // TODO: we should use an add_generic function that takes advantage of the double generic gate
            compiler.backend.add_generic_gate(
                "add a constant with a variable",
                vec![Some(*cvar), None, Some(res)],
                vec![one, zero, one.neg(), zero, *cst],
                span,
            );

            Var::new_var(res, span)
        }
        (ConstOrCell::Cell(lhs), ConstOrCell::Cell(rhs)) => {
            // create a new variable to store the result
            let res = compiler.backend.new_internal_var(
                Value::LinearCombination(
                    vec![(B::Field::one(), *lhs), (B::Field::one(), *rhs)],
                    B::Field::zero(),
                ),
                span,
            );

            // create a gate to store the result
            compiler.backend.add_generic_gate(
                "add two variables together",
                vec![Some(*lhs), Some(*rhs), Some(res)],
                vec![B::Field::one(), B::Field::one(), B::Field::one().neg()],
                span,
            );

            Var::new_var(res, span)
        }
    }
}

The previous code can be simplified by the following:

if both $a$ and $b$ are constants, return the constant $a + b$
otherwise create a new witness variable $c$ and add the constraint $c = a + b$

The same kind of logic can be observed in the multiplication function.

[!WARNING] At this point, we actually do NOT want to create constraints! This is too soon! We will explain why we want to delay the creation of constraints when such operations occur later in this document. (In particular, we'll want to delay them until we hit assertion or builtin functions.)

Finally, assertions using functions like assert and assert_eq will always create constraints as well on their variables.

[!NOTE] Gadgets provided by the backend, or essentially any backend functions, might create CellVar as well as constraints without letting you know. This is for example the case of builtins like poseidon() in plonk backends, which need to create a number of intermediary CellVar and constraints.

Open question about modular constraint system backends

The current code of noname uses an add_gate function, which is plonk-specific. One question is, how can we generalize this to more constraint system backends.

One start of an answer is to provide at the very least three functions:

add(ConstOrCell<Self::Field>, ConstOrCell<Self::Field>) - > ConstOrCell<Self::Field>;
mul(ConstOrCell<Self::Field>, ConstOrCell<Self::Field>) - > ConstOrCell<Self::Field>;
assert_eq(ConstOrCell<Self::Field>, ConstOrCell<Self::Field>);

My theory is that a backend that provides these three functions (in addition to builtins) should be enough.

[!NOTE] Let's see how far this gets us. We might want to add convenience functions like a linear_combination() function, but let's not prioritize that for now.

List of optimizations for Plonk

To understand why the logic currently implemented in Noname is inefficient, we first explain how R1CS and Plonk constraints can be optimized.

Packing a generic gate

Imagine the following noname code:

let a = x + 5;
let b = 7 * a + 8;

There are two additions and one multiplications, and with the current logic all of these operations will prompt the creation of a constraint. So we will have something like the following python code:

a = new_internal_var(lambda : x + 5)
# 1 * x + 0 * _ + -1 * a + 0 * x * _ + 5
add_gate(vars=[x, None, a], coeffs=[1, 0, -1, 0, 5])

temp = new_internal_var(lambda : 7 * a)
add_gate(vars=[a], coeffs=[7])

b = new_internal_var(lambda : temp + 8)
add_gate(vars=[temp, None, b], coeffs=[1, 0, -1, 0, 8])

But in reality, since there are only additions and scaling happening here, and we do not care about intermediary variables (we're not constrain a or temp to be anything here), we could have added a single variable to basically represent the following code:

let b = 7 * x + (7 * 5 + 8)

which would have been:

b = new_internal_var(lambda : 7 * x + 43)
add_gate(vars=[x, _, b], coeffs=[7, 0, -1, 0, 43])

and we could have continued this sort of optimization until we reached an equation that needed more than one gate to express/encode. Usually, this is because we either have a degree more than 2 (e.g. x * y * z) or because too many variables are involved.

[!WARNING] We should answer this question here more rigorously: When can't we express a constraint as a single generic gate? (in plonk and in R1CS).

In practice, we NEVER really have to add any constraint until an assert_eq is used. So while a bunch of multiplication will force us to add many constraints/gates, we theoretically don't have to until something like an assertion is used.

[!NOTE] In addition, one could argue that even the use of an assert could be "optimized out" by the compiler, if it can be proven that no public input value, or constant, can impact the validity of a that constraint.

This is because in most application you don't care that something is "possible", you care that it is possible in relation to something in the public input OR in relation to something that was hardcoded in the circuit.

As an example, if this is confusing, an assertion that doesn't involve either of these cases would be something like assert_eq(x, y), or x = poseidon(y), where x and y can be set to any value by the prover.

Avoiding reducing functions multiple times

Let's look at this example:

let a = x + 5;
let c = 7 * a + 8;
assert_eq(c, 4);
let b = 7 * a + 8;
assert_eq(b, 4);

The values c and b will basically be the same, but the current compiler (and even compilers like snarky) can't remember that.

[!WARNING] I'm actually wondering how we can solve this efficiently. We should discuss this problem later, depending on the solution we choose.

Avoiding constraining the same constants multiple times

Sometimes we need to force a cell of the execution trace to have a specific constant in it.

One example is when using the poseidon() custom gate, we expect the first three wires/columns/registers of the first constraint added by the gadget to be set as the poseidon input: a user input (free variable) and two zeros.

What this means is that we usually create a constraint to enforce the constant 0:

zero = new_internal_var(lambda : 0)
add_gate(vars=[zero], coeffs=[1])

and then, we wire the two zero cells in the first poseidon row to that zero constant we created.

If we use poseidon several times, then we'll create several constraints to create the same constant multiple times. To avoid doing that noname simply caches the creation of such constants:

pub(crate) cached_constants: HashMap<VestaField, CellVar>,

and when we add a constant we check that cache first:

fn add_constant(
    &mut self,
    label: Option<&'static str>,
    value: VestaField,
    span: Span,
) -> CellVar {
    if let Some(cvar) = self.cached_constants.get(&value) {
        return *cvar;
    }

Using the permutation to assert equality (instead of creating a constraint)

From snarky's documentation:

Sometimes, we know that two variables will have equivalent values due to an assert_equal() being called to link them. Since we link two variables together, they need to be part of the same cycle, and as such we need to be able to detect that to construct correct cycles.

To do this, we use a [union find]() data structure, which allows us to easily find the unions of equivalent variables.

When an assert_equal() is called, we link the two variables together using the union_finds data structure.

During finalization, when we create the cycles, we use the union_finds data structure to find the equivalent variables. We then create a new equivalence classes hashmap to merge the keys (variables) that are in the same set. This is done before using the equivalence classes hashmap to construct the cycles.

Optimizing constraints in R1CS

You might have heard of "additions are for free in R1CS", or "R1CS" is for quadratic constraints tops. These should make sense once we review examples here.

Bundling additions

The easiest example to understand is a bit-decomposition which can happen in a single constraint no matter the number of bits (and not including the bit constraints). For example, you can see the Circom implementation and the constraints it leads to:

Gauss-Jordan elimination

Circom uses Gauss-Jordan eliminiation

[!WARNING] TODO: WRITE THIS PART

What should we do?

[!WARNING] I'm currently writing this section, so these thoughts are still up in the air and not super clear :)

I'm wondering what we should do:

should we attempt to handle these optimizations as early as possible
should we do these optimizations later on a. meaning, on the constraint system we obtain b. or on an IR, before we get to the constraint system?

So, 2.a seems difficult because at this point the constraint system is tightly tied to a "witness generation program" (a list of instructions on how to fill the actual witness vector/table with values at runtime/proving time). Changing one means changing the other to ensure that they are consistent, which is hard to do at this point.

It seems like the point of having an IR is that it simplifies things, and avoids code duplication. From my experience, these kinds of concepts should naturally emerge from the pain of writing something without them. So my proposal is that we do not use an IR until we naturally feel the need to have one to simplify a bunch of code we have written. There isn't enough evidence for an IR at this point, based on our own experience here.

So looking at the last remaining solution: we have to handle these optimizations as early as possible. This means that when we operate on CellVar we should avoid creating constraints.

[!WARNING] TODO: One question we should answer is if this is enough, or if we still need to use Gauss-Jordan elimination like in Circom.

How should we implement this change?

We can modify CellVar to provide a mini-AST of all the additions and multiplications that have happeneded so far. Only when we reach an assert or gadget do we "reduce" the mini-AST by adding as many constraints and internal CellVar as possible.
We can keep creating CellVar without adding any constraint, and keep track internally of what these cellvars are in a field

so first solution would look like this:

enum CellVarKind<F> {
    Constant(F), // this means we would track 
    Var(usize),
    Add(CellVar, CellVar),
    Scale(F, CellVar),
    Mul(CellVar, CellVar),
}

struct CellVar<F> {
    pub kind: CellVarKind<F>,
    pub span: Span,
}

whereas the second solution would look like this:

enum CellVarState<F> {
    Reduced, // means it was part of a constraint
    Free(CellVarKind<F>), // means it hasn't been part of a constraint so far
}

struct KimchiBackend {
    // TRUNCATED...
    vars_to_value: Vec<CellVarState<F>>,

these solutions are basically equivalent, and we've taken the same kind of approaches everywhere in the compiler (either the data lives in the object itself, or in a field of the compiler).

zksecurity / noname

optimize constraints #42