powdr-labs / powdr

A modular stack for zkVMs, with a focus on productivity, security and performance.
Apache License 2.0
353 stars 72 forks source link

Memory machine based on Offline Memory Checking (Blum et al.) #1300

Open georgwiese opened 2 months ago

georgwiese commented 2 months ago

Some zk-VM projects (e.g. SP1, Jolt) use a memory argument different from the "double-sorted" memory machine that we implement. I summarized it here. This issue discusses how it could be implemented in Powdr and what the trade-offs are.

High-level idea

The idea is to construct two sets, a read_set and a write_set that are constrained to be equal (equivalently: one is a permutation of the other).

The program runs in 3 phases:

Implementation

To make things more concrete, this is my current understanding of how this would be implemented: offline_memory_checking

Let's start by looking into the main execution (middle section). As mentioned in the "constraints" section, 2 of the 6 illustrated columns ($a_r$ and $t_r$) are actually virtual. $a_w$ and $v_w$ would be assignment register that exist anyway in a zk-VM. Therefore, the prover would only have to commit to 2 extra columns ($t_w$ and $v_r$)! This compared to at least 5 extra witness columns in the sorted memory machine (addr, step, change, value, is_write).

For the initialization + finalization, the prover has to commit to 3 witness columns, of a length equal to the number of unique addresses being accessed:

  1. The unique addresses (this should be $a_w$ in the init section and $a_r$ in the finalization section, so it only needs to be committed once)
  2. The final values of each accessed memory cell ($v_r$ in the finalization section)
  3. The final time stamps of each accessed memory cell ($t_r$ in the finalization section)

The initial values and time-stamps are 0, so they don't need to be committed.

Note: I'm not 100% sure if it is fine to let the prover choose the accessed memory cells arbitrarily, but I don't see why not. In Lasso, they are not actually prover-committed but hard-coded to a polynomials that enumerates all possible addresses.

Connecting the main execution with initialization + finalization

I see two approaches:

  1. Monolithic proof: If we had a permutation argument on concatenations of columns (possible to implement using the challenge API), we could have a separate machine for initialization & finalization with the 3 columns mentioned above. Since the number of unique addresses can't exceed the trace length, it would fit.
  2. VADCOP: Since with a VADCOP design (#424) we could have permutation arguments across shards, we could split out this extra machine and proof independently. This is what SP1 does as well.

Comparison to double-sorted memory machine

As mentioned above, the double-sorted memory machine needs at least 5 extra witness columns. But this approach also needs 2 extra witness columns in the main machine + 3 extra witness columns to prove the initialization & finalization phase. So, in terms of witness columns, it's a tie.

It is not obvious to me which approach is faster. But I can see how this approach has advantages in the VADCOP setting (which could also be used to implement memory continuations with the double-sorted memory machine!):

  1. If the number of unique addresses is much less than the number of memory accesses, the initialization + finalization proof would be smaller than the proof of the double-sorted memory machine.
  2. For witness generation, the prover only needs to keep track of the current memory state, not all memory accesses (which in the double-sorted memory machine need to be sorted)
georgwiese commented 2 months ago

To make things concrete, here is an example:

Suppose this was our assembly program:

mstore 42, 1;
mload 17;
mstore 42, 9;
mload 42;
mstore 17, 3;

This could be represented using this witness:

instr_mstore instr_mload addr value prev_value prev_t addr_used final_value final_t
1 0 42 1 0 0 17 3 4
0 1 17 0 0 0 42 9 3
1 0 42 9 1 1 any 0 0
0 1 42 9 9 2 any 0 0
1 0 17 3 0 1 any 0 0

Columns of the main machine:

Extra columns in dedicated to prove correct results of the initialization + finalization phase:

So, in the end there are extra 5 witness columns to verify memory (prev_value, prev_t, addr_used, final_value, final_t).

The constraints would be:

instr mload addr -> value {
  // The memory verifier writes back the same value as it reads from the untrusted memory
  value - prev_value = 0;
}
instr mstore addr, value -> {
  // No constraint, other than that this row participates in the permutation check below.
  // In particular, `prev_value` and `prev_t` are not constrained here, because the untrusted memory can return
  // any value here; the permutation check below will force it to be correct.
}

// A permutation argument used to assert that the read and write sets are equal.
// This requires a permutation argument on *concatenations* of columns.
// I'm making up a syntax for this here.

concat(
  // Read set during main machine execution
  (instr_mload + instr_mstore) {addr, prev_value, prev_t},
  // Read set during finalization
  {addr_used, final_value, final_t}
) is concat(
  // Write set during main machine execution
  // Using `prev_t + 1` here ensure that the memory verifier incremented the time step
  // by 1, as prescribed.
  (instr_mload + instr_mstore) {addr, value, prev_t + 1},
  // Write set during initialization
  // Re-using `addr_used` here ensures that the memory verifier initializes the same addresses as
  // they finalize, as prescribed.
  // Using 0 for value and time stamp ensures that the memory verifier initializes all cells as prescribed.
  {addr_used, 0, 0}
)

In summary, the constraints ensure that:

georgwiese commented 2 months ago

I just realized that the scheme as described above is not secure. For example, in this assembly program:

mload 0;
mstore 0, 42;
mload 0;

The constraints described above would not prevent the untrusted memory from returning:

In other words, the tuples returned by the untrusted memory for the two reads can be interchanged!

The reason for this bug is that I took the description from Lasso, which uses this argument to implement a read-only memory. To implement a read-write memory, we'd have to implement the original argument by Blum et al. (Section 4.1):

So, in the end, this requires an extra fixed column (which we also need in the double-sorted memory machine) and an extra lookup.

The example above changes as follows:

concat(
  // Read set during main machine execution
  (instr_mload + instr_mstore) {addr, prev_value, prev_t},
  // Read set during finalization
  {addr_used, final_value, final_t}
) is concat(
  // Write set during main machine execution
  (instr_mload + instr_mstore) {addr, value, STEP},
  // Write set during initialization
  {addr_used, 0, 0}
)