Wrapping could break memory soundness

valida-xyz / valida

A STARK-based VM focused on code reuse, performance, and modularity

Apache License 2.0

291 stars 50 forks source link

Valida is using a common memory argument, where the prover commits to a log of operations ordered by (address, timestamp). Once we have this ordered log, memory consistency is easily enforced with constraints.

To simplify this discussion, say we're ordering by address only. We verify this ordering by range checking each address delta, asserting that it is within some "small" range like 16 bits.

The circular nature of field elements complicates this; e.g. an address step from p - 1 to 0 would be seen as a small delta of 1. So if a memory log had a sequence of addresses like 0, 2^16, 2 * 2^16, ..., wrapping around beyond p, it could revisit some address a second time, breaking the soundness of our memory argument.

Our initial thought was to avoid this problem by limiting the address space to ~30 bits, creating a ~30 bit "dead zone". Attempting to cross this "dead zone" would result in a failed range check.

However, we currently allow the prover to insert "dummy reads", to break down larger address deltas into smaller ones. It seems like these could also be used to break down a ~30 bit "dead zone" gap into small deltas.

Instead of adding dummy rows in the memory table to break down large deltas, we could just support larger range checks. Say we limit the address space to 29 bits; then any legitimate delta will be < 2²⁹. We could do a 29-bit range check by breaking it into say 14- and 15-bit range checks.

It might be worthwhile to also evaluate other memory techniques though. Triton has a unique method of enforcing that the log is grouped (not necessarily ordered) by address. BEGKN92 (section 4.1) offers a different approach, which doesn't involve address grouping at all. I'm not up to speed on Lasso yet, but I think it has another method inspired by BEGKN92.

Draft of notes below:

For the BEGKN92-inspired memory argument, I believe we only need the following 15 columns:

Read columns: address_r, value_r_0, value_r_1, value_r_2, value_r_3, clk_r Write columns: address_w, value_w_0, value_w_1, value_w_2, value_w_3, clk_w Other columns: is_read, is_audit, clk

There are two possible state changes in memory: a. Read: Read (a, v, ts). Write back (a, v, ts') where ts' = clk+1. b. Write: Read (a, v, ts). Write back (a, v', ts').

In terms of lookups, there are still only 3 memory channels in the CPU chip (for reading two words, and writing one), except a write now uses a separate bus (i.e. the memory bus is split into a read bus and write bus). The sent column values are unchanged, and still look like (address_r, {byte_values}, clk, is_read). On the receiving end, in the memory chip, we also now have two buses mapping the incoming reads and writes to their respective column fields.

At each row of the memory trace, either a read or write (but not both) is looked up from the bus, so the prover nondeterministically sets the opposing columns. In the case of a read, the memory chip has constraints that ensure the corresponding write timestamp is an increment of the CPU clock, and that the written value remains unchanged. In the case of a write, the prover must set the read columns to valid values in order for the audit check (see below) to pass.

For the audit check (done following all bus read/writes), the prover nondeterministically adds rows (the audit set M) to fix the initial and final address-value pairs. If we assume that all used memory values should start at zero, then for each row the value_w fields should be set to zero, checked by the constraint is_audit * value_w_i = 0. The prover sets the read columns for each row to the final memory value states. Note that it's not necessary to check uniqueness of M, because read and write entries are always added together in our arithmetization (the sets cannot be imbalanced).

valida-xyz / valida

Wrapping could break memory soundness #40