zcash / halo2

The Halo2 zero-knowledge proving system
https://zcash.github.io/halo2/
Other
728 stars 495 forks source link

Maintain common subexpressions in the expression graph #686

Open daira opened 1 year ago

daira commented 1 year ago

The Ast is currently a tree, but since we use Arcs to refer to nodes, nothing prevents it from representing a directed acyclic graph. We can then use graph evaluation to avoid recomputing common subexpressions.

naure commented 1 year ago

If there isn’t too many common subexpressions, deduplication or memoizing should not be too effective. Instead, the gate expressions need to be rearranged to reveal common or constant subexpressions.

It is true that individual gates are given in an efficient factorized form, such as gate(a, b, c, d…) = (a+1)(b+c), where a, b, c, d… are the available queries. But that is not true of the linear combination of gates:

image

all_gates(a, b, c, d…) = y^0.(a+1)(b+c) + y^1.(a+1)(b+c) + y^2.(a+1)(b+c) + y^3.(a+1)(b+c) + … (hundreds)

which looks almost like a monomial form in the queries ab, ac, …. It has a subset of the terms allowed by the max degree, and many repeated terms (e.g, y^0.ab + y^1.ab.

So it turns out to be a case of Horner’s rule, with multiple variables. The strategy is then to find a Horner factorization of all_gates at compile time (example paper). Here is an example with one level of factorization, schematically:

all_gates(a, b, c, d…) = L + a.R

    L = (y^0 + y^1 + …).(b+c)
    R = (y^0 + y^1 + …).(b+c)

The powers of y are kept as symbols that appear in subexpressions, to evaluate only once by memoizing. The evaluation at all rotations exploits a fully factored form rather than a long sum of partial forms.