Add input validation for arithmetizable functions

vlasin commented 6 months ago

Currently, the inputs of arithmetizable computations are not validated. The simplest example is the bool function. It can be arithmetized in three Plonk constraints: bool x y (Bool b) == b * (y - x) + x. To validate b, we would need an extra constraint b * (one - b) == zero.

If we validate the inputs of every intermediate operation, we will be adding a lot of duplicate constraints to the circuit. So, I suggest adding input validation constraints in the Arithmetizable instance for functions, because it only runs for the top-level functions, and the arguments of these functions are the true inputs to the resulting circuits.

TurtlePU commented 2 months ago

With

definition of Symbolic from #195
definition of SymbolicData from here
and with a version of @echatav's structured inputs from #177,

we can do something along these lines:

class (SymbolicData c d, Input c d ~ (), Representable (Layout c d)) => SymbolicInput c d where
  inputConstraint :: MonadCircuit i (BaseField c) m => Layout c d i -> m ()
  -- we could have `pure ()` as a default impl, but this would not mix well with deriving

class RepCat c where
  rid :: Representable i => c i i
  -- can also have rcomp :: c j k -> c i j -> c i k, but this is not needed here

-- instance RepCat (AC a) where output rid = tabulate InVar

input :: (RepCat c, Symbolic (c i), SymbolicInput (c i) d, Layout (c i) d ~ i) => d
input = restore $ const $ fromCircuitF rid (\is -> inputConstraint is >> return is)

TurtlePU commented 2 months ago

A case study for inputConstraint proposed above: Maybe symbolic type.

The desired input constraints look like: "either discriminant is false (and value is Nothing) or discriminant is true and the value inside satisfies the input constraints". This cannot be expressed using the definition of inputConstraint above in a compositional way. Maybe instead it should have the following type:

inputConstraint :: Layout c d i -> System (BaseField c) i

Where System a i is a CNF of explicit (polynomial? range?) constraints which is a BoolType and can be composed into an arbitrarily complex formula which is then desugared into the system of constraints during compilation.

However, this way we cannot create new intermediate variables while constructing the system. Is it desired? Is there a type which needs additional variables to encode its variants?

P.S.: for intermediate variables, we can provide an type Auxiliary c d :: Type -> Type associated representable functor and provide variables in advance, like this:

inputConstraint :: Layout c d i -> Auxiliary c d i -> System (BaseField c) i

But once again, given an expressive enough constraint system, we might not need user-defined auxiliary variables (but of course they might show up during the desugaring of input constraints).

P.P.S.: or, for auxiliary variables, we can separate MonadCircuit interface further: imagine MonadCircuit0 without constraint function. Then we can ask for the following:

inputConstraint :: (BaseField c ~ a, MonadCircuit0 i a m) => Layout c d i -> m (System a i)

vlasin commented 2 months ago

Could you expand a bit on why the input constraint for Maybe cannot be expressed with the definition above?

TurtlePU commented 2 months ago

Could you expand a bit on why the input constraint for Maybe cannot be expressed with the definition above?

Imagine that data stored inside Maybe is constrained with a single polynomial $P(x_1,\ldots,x_n) = 0$. Now Maybe with a discriminant $y$ is constrained with a formula $y = 0 \lor (y = 1 \land P(x_1,\ldots,x_n) = 0)$ with an equivalent system being $y (y - 1) = 0 \land y \cdot P(x_1,\ldots,x_n) = 0$.

So basically this means that monadic action provided by inputConstraint @c @x cannot be reused inside inputConstraint @c @(Maybe c x) naively (but maybe we can work this out with some shenanigans on top of MonadCircuit, but I don't know if this is worth it).

P.S. One possible "shenanigan" is to add a function to prepend all constraints added inside a MonadCircuit with a multiplier (and a procedure to desugar such constraints into Plonk ones). But how should this work with constraints for new variables in newConstrained?

vlasin commented 2 months ago

I don't see a problem for types that have at least one known value satisfying validation constraints. Instead of $y (y - 1) = 0 \wedge y \cdot P(x_1, \dots, x_n) = 0$, we can have just $y (y - 1) = 0 \wedge P(x_1, \dots, x_n) = 0$. Basically, we can supply a correct value of type t for Maybe t, even in the case of Nothing.

So, in my opinion, the problem lies in the current definitions for symbolic Maybe. In particular, in the definitions of nothing and find.

TurtlePU commented 2 months ago

Or maybe I'm overcomplicating things and this is actually as simple as

isValid :: d -> Bool c

:D And then we can rely on optimizations to do their thing and produce proper constraints

vlasin commented 2 months ago

Yes, I agree. This simpler solution is potentially as efficient as the first one.

vlasin commented 2 months ago

And we can properly define isValid for Maybe t using isValid for t, as you described here:

Imagine that data stored inside Maybe is constrained with a single polynomial P(x1,…,xn)=0. Now Maybe with a discriminant y is constrained with a formula y=0∨(y=1∧P(x1,…,xn)=0) with an equivalent system being y(y−1)=0∧y⋅P(x1,…,xn)=0.

TurtlePU commented 2 months ago

Exactly!

TurtlePU commented 2 months ago

Solution with isValid however needs thought on how to resolve range constraints efficiently If one were to operate with constraints directly, they could specify range constraint and be done with it With isValid, we might want to somehow guarantee that, after we optimize isValid input == true, we get range constraints from the, for example, comparison of a value with a constant that returns a Bool c.

vlasin commented 2 months ago

I think we want to produce range constraints whenever possible. If isValid uses one of the "user-facing" functions that produce range constraints, then we will have them as a result of isValid. For example, operations like <= will result in range constraints. We can also introduce a high-level function that produces a range constraint (or checks that it is satisfied:

isInRange :: FieldElement c -> FieldElement c -> FieldElement c -> Bool c
isInRange x a b =
-- checks that `a <= x && x <= b`

zkFold / zkfold-base

Add input validation for arithmetizable functions #60