Memory machine based on Offline Memory Checking (Blum et al.)

Some zk-VM projects (e.g. SP1, Jolt) use a memory argument different from the "double-sorted" memory machine that we implement. I summarized it here. This issue discusses how it could be implemented in Powdr and what the trade-offs are.

High-level idea

The idea is to construct two sets, a read_set and a write_set that are constrained to be equal (equivalently: one is a permutation of the other).

The program runs in 3 phases:

Initialization: All used addresses are initialized with a value of 0 and the time-stamp of 0 by adding tuples $(a, 0, 0)$ to the write_set.
Execution of the actual program:
- MSTORE(a, v) at time $t$:
- $(a, v', t)$ is added to the read_set, for any value $v'$ and $t$
- $(a, v, t + 1)$ is added to the write_set
- MLOAD(a) at time $t$:
- $(a, v, t)$ is added to the read_set, for any value $v$ and $t$
- $(a, v, t + 1)$ is added to the write_set
Finalization: For each address $a$ added to the write_set during initialization, we add $(a, v, t)$ to the read_set, for some values $v$ and $t$ (the honest prover would use the last values and time stamps)

Implementation

To make things more concrete, this is my current understanding of how this would be implemented: offline_memory_checking

Let's start by looking into the main execution (middle section). As mentioned in the "constraints" section, 2 of the 6 illustrated columns ($a_r$ and $t_r$) are actually virtual. $a_w$ and $v_w$ would be assignment register that exist anyway in a zk-VM. Therefore, the prover would only have to commit to 2 extra columns ($t_w$ and $v_r$)! This compared to at least 5 extra witness columns in the sorted memory machine (addr, step, change, value, is_write).

For the initialization + finalization, the prover has to commit to 3 witness columns, of a length equal to the number of unique addresses being accessed:

The unique addresses (this should be $a_w$ in the init section and $a_r$ in the finalization section, so it only needs to be committed once)
The final values of each accessed memory cell ($v_r$ in the finalization section)
The final time stamps of each accessed memory cell ($t_r$ in the finalization section)

The initial values and time-stamps are 0, so they don't need to be committed.

Note: I'm not 100% sure if it is fine to let the prover choose the accessed memory cells arbitrarily, but I don't see why not. In Lasso, they are not actually prover-committed but hard-coded to a polynomials that enumerates all possible addresses.

Connecting the main execution with initialization + finalization

I see two approaches:

Monolithic proof: If we had a permutation argument on concatenations of columns (possible to implement using the challenge API), we could have a separate machine for initialization & finalization with the 3 columns mentioned above. Since the number of unique addresses can't exceed the trace length, it would fit.
VADCOP: Since with a VADCOP design (#424) we could have permutation arguments across shards, we could split out this extra machine and proof independently. This is what SP1 does as well.

Comparison to double-sorted memory machine

As mentioned above, the double-sorted memory machine needs at least 5 extra witness columns. But this approach also needs 2 extra witness columns in the main machine + 3 extra witness columns to prove the initialization & finalization phase. So, in terms of witness columns, it's a tie.

It is not obvious to me which approach is faster. But I can see how this approach has advantages in the VADCOP setting (which could also be used to implement memory continuations with the double-sorted memory machine!):

If the number of unique addresses is much less than the number of memory accesses, the initialization + finalization proof would be smaller than the proof of the double-sorted memory machine.
For witness generation, the prover only needs to keep track of the current memory state, not all memory accesses (which in the double-sorted memory machine need to be sorted)

To make things concrete, here is an example:

Suppose this was our assembly program:

mstore 42, 1;
mload 17;
mstore 42, 9;
mload 42;
mstore 17, 3;

This could be represented using this witness:

instr_mstore	instr_mload	addr	value	prev_value	prev_t	addr_used	final_value	final_t
1	0	42	1	0	0	17	3	4
0	1	17	0	0	0	42	9	3
1	0	42	9	1	1	any	0	0
0	1	42	9	9	2	any	0	0
1	0	17	3	0	1	any	0	0

Columns of the main machine:

instr_mstore & instr_mload are instruction flags, as we have them now in the main machine.
addr and value are assignment registers, as we have them now in the main machine. They can be re-used for other instructions, I just named them addr and value for clarity.
prev_value: The previous value of the memory cell.
prev_t: The previous time stamp of the memory cell.

Extra columns in dedicated to prove correct results of the initialization + finalization phase:

addr_used: All unique addresses used throughout the computation. This list will be at most as long as the execution trace (assuming we access at most one address per trace row), but is probably shorter. It can be padded with any value, as it will be added to both the read an write set.
final_value: The value of the corresponding memory cell at the end of the execution.
final_t: The time stamp of the corresponding memory cell at the end of the execution.

So, in the end there are extra 5 witness columns to verify memory (prev_value, prev_t, addr_used, final_value, final_t).

The constraints would be:

instr mload addr -> value {
  // The memory verifier writes back the same value as it reads from the untrusted memory
  value - prev_value = 0;
}
instr mstore addr, value -> {
  // No constraint, other than that this row participates in the permutation check below.
  // In particular, `prev_value` and `prev_t` are not constrained here, because the untrusted memory can return
  // any value here; the permutation check below will force it to be correct.
}

// A permutation argument used to assert that the read and write sets are equal.
// This requires a permutation argument on *concatenations* of columns.
// I'm making up a syntax for this here.

concat(
  // Read set during main machine execution
  (instr_mload + instr_mstore) {addr, prev_value, prev_t},
  // Read set during finalization
  {addr_used, final_value, final_t}
) is concat(
  // Write set during main machine execution
  // Using `prev_t + 1` here ensure that the memory verifier incremented the time step
  // by 1, as prescribed.
  (instr_mload + instr_mstore) {addr, value, prev_t + 1},
  // Write set during initialization
  // Re-using `addr_used` here ensures that the memory verifier initializes the same addresses as
  // they finalize, as prescribed.
  // Using 0 for value and time stamp ensures that the memory verifier initializes all cells as prescribed.
  {addr_used, 0, 0}
)

In summary, the constraints ensure that:

The memory verifier followed the prescribed protocol.
The read and write sets are equal -> the untrusted memory answered honestly.

I just realized that the scheme as described above is not secure. For example, in this assembly program:

mload 0;
mstore 0, 42;
mload 0;

The constraints described above would not prevent the untrusted memory from returning:

$(0, 42, 2)$ (wrong time stamp and value!)
$(0, 0, 1)$
$(0, 0, 0$ (wrong time stamp and value!)

In other words, the tuples returned by the untrusted memory for the two reads can be interchanged!

The reason for this bug is that I took the description from Lasso, which uses this argument to implement a read-only memory. To implement a read-write memory, we'd have to implement the original argument by Blum et al. (Section 4.1):

The verifier maintains a current global time stamp
- This could be a fixed column STEP(i) { i }, as we have now
For every read operation, we assert that the returned time stamp is smaller than the current time stamp
- This could be could be done with a lookup, e.g. {STEP - prev_t} in {STEP}
Instead of writing prev_t + 1, the verifier uses STEP

So, in the end, this requires an extra fixed column (which we also need in the double-sorted memory machine) and an extra lookup.

The example above changes as follows:

The prev_t and final_t columns would now store different time stamps (the global time stamp at the time of the last write)
The permutation check is adjusted as follows:

concat(
  // Read set during main machine execution
  (instr_mload + instr_mstore) {addr, prev_value, prev_t},
  // Read set during finalization
  {addr_used, final_value, final_t}
) is concat(
  // Write set during main machine execution
  (instr_mload + instr_mstore) {addr, value, STEP},
  // Write set during initialization
  {addr_used, 0, 0}
)

powdr-labs / powdr