sampsyo / cs6120

advanced compilers
https://www.cs.cornell.edu/courses/cs6120/2023fa/
MIT License
709 stars 157 forks source link

Project Proposal: External Code Blocks in CirC #399

Open collinzrj opened 9 months ago

collinzrj commented 9 months ago

What will you do? CirC is a compiler that compiles high level languages to (state-free, non-uniform, existentially quantified) circuits for SMT solver, zero-knowledge proof, multiparty computations, and more. In the case of zero-knowledge proof, sometimes it is easier to verify a problem than actually solving the problem. For example, consider the following linear system:

ax + by = c
dx + ey = f

We write a program to solve the system

def solve_linear_system(a, b, c, d, e, f):
    x = (ce - bf)/(ae - bd)
    y = (af - cd)/(ae - bd)
    return x, y

It takes 8 multiplications, 2 divisions, and 4 minus operation to compute x and y, But given x and y, it only takes four multiplication and 2 addition to perform the verification. Although these operations are cheap for CPU, they are very expensive when expressed as circuits, and the programmer wants to minimize the number of circuits.

In that case, it's better to write the program in this form

def solve_linear_system(a, b, c, d, e, f, x, y):
    ax + by = c
    dx + ey = f
    return x, y

However, then the programmer has to precompute x and y somewhere else, which is inconvenient, especially in the following case. The programmer will have to implement a function that infers the value of x2 y2 based on x and y as well.

def main(a, b, c, d, e, f):
    x, y = solve_linear_system(a, b, c, d, e, f)
    x2, y2 = solve_linear_system(a, b, x, d, e, y)

As a result, we want to implement a feature that allows the programmer to express part of computation as "external", such that will not be compiled into circuits.

With the feature, the programmer can write program like this

def solve_linear_system(a, b, c, d, e, f):
    external {
        x = (ce - bf)/(ae - bd)
        y = (af - cd)/(ae - bd) 
    }
    ax + by = c
    dx + ey = f
    return x, y

The programmer still doesn't have to provide x and y as arguments to the function, which make the programmer's life much easier.

How will you do it? The external block feature has been implemented in another zero-knowledge compiler xjsnark, so we can learn how they implemented the feature, and figure out how to incorporate that into CirC.

How will you empirically measure success? It's straightforward to measure success since we are implementing a new feature, we will just provide an example that CirC compiles a program with an "external" code block successfully.

Team members: @MelindaFang-code @collinzrj

sampsyo commented 9 months ago

OK, sounds interesting!

Before you get started on this (and hopefully within ~1 week), can you please expand the plan here for the evaluation? I hope there is room here to do something a little more systematic than just showing successful compilation for one handcrafted example.

For instance, is there a corpus of pre-existing CirC programs that could plausibly benefit from this new feature? If you had, say, 20 of them, then you could try to modify all of them to take advantage of the new feature and report on (a) your success in doing the modification, (b) any reduction in code size or other metric, and (c) correctness, perhaps by comparing the code to the original. Of course, that might be impossible because existing programs are not a good fit for modifying to use the new feature and you're trying to target new programs that cannot otherwise be written… in that case, could you consider porting programs from the other compiler you mentioned (xjsnark) to evaluate success?

While there are many possibilities here, I hope we can come up with a systematic technique that somehow involves realistic, pre-existing code instead of handwritten test cases.