Sound Continuations - Githubissues

We started implementing non-sound continuations in #789. In this Issue, I'd like to collect all steps necessary to make it sound.

Background:

The main idea is outlined here and highly influenced by RISC Zero's approach.

In summary, at the beginning of the program, we run a bootloader, which lets the prover set all registers and initialize the fresh memory with the content needed for this chunk. For this to be sound, (commitments of) the memory and register values have to be exposed as public outputs for the initial values of the chunk and the final values. With these commitments, the verifier can verify that a chunk starts from where the previous left off. The existing constraints of the zk-VM make sure that the transition from the initial to the final state was correct.

Memory content is organized in pages (e.g. of size 1KB, or 256 32-Bit words). The commitment of memory is a Merkle root hash of the list of pages. This way, the prover can sparsely page in the memory content needed for this chunk while still being able to prove that it is part of the committed memory.

Making it sound:

This takes the following steps:

[x] Implement public inputs / outputs
[x] Expose register values at the beginning and the end of the execution as public outputs
- This might not be necessary if register values are loaded from memory, but it also only increases the verifier cost by a constant
[x] Hash pages while writing them to memory
- 813
[x] Merkle-proof the containment of each page in the memory commitment (making the memory commit hash public)
- 819
[ ] Hash pages at the end of the execution
[ ] #1658
[x] Update the Merkle root
[x] Make sure that (1) only paged-in pages are accessed during execution and (2) all accessed pages are also hashed again and used to update the Merkle root

Details:

This section makes some of the later items more concrete.

Merkle-proof the containment of each page in the memory commitment

This can be implemented in assembly as part of the bootloader. Once for each page, it would:

Bit-decompose the prover-provided page number to reconstruct the path in the Merkle tree
For each level (e.g. 22 levels if our memory is $2^{32}$ byte and we have a page size of $2^{10}$ bytes), the prover provides a sibling and hashes them together
The root hash is exposed as public input

Hash pages at the end of the execution

This can be implemented by having a "shutdown routine" assembly program analogous to the bootloader that runs until the end of the execution. The tricky part is that we don't know at compile time when we need to jump there.

One solution would be to:

Change the PC equation so that the prover can at any time jump to a specific PC (the start of the shutdown routine)
We also have to prohibit jumping there if we're already in the shutdown routine
Also constrain the PC to be equal to a specific value in the last row (the end of the shutdown routine), so we can be sure that the routine ran completely

With these constraints, the only thing the prover can do is to jump to the shutdown routine just in the right time.

Update the Merkle root

Because we're currently implementing a 32-Bit RISC-V architecture, we can use a dense Merkle tree of $2^{22}$ leafs to represent the entire memory. In a dense Merkle tree, a single leaf can be updated by doing two Merkle proofs using the same siblings, for the old and new leaf value.

For this, it will be convenient to use the write-once memory introduced in #753, which lets the prover set free values while forcing the prover to provide the same value in different rows of the execution (because the program would read from the same address).

For example, the algorithm (run either in the bootloader or shutown routine) could be this:

Prove membership of the first page in the initial memory Merkle tree (whose root hash is publicly exposed)
Using the same siblings and the page hash at the end of the execution, compute the updated Merkle root (after updating just one page)
Repeat steps 1-2 for all pages, continuously updating the Merkle root from step to step.
Expose the last root hash as public output

To make this works, all page hashes (before and after the chunk execution) and all siblings need to be in write-once memory. This way, we can enforce that siblings are re-used. Also, the bootloader can already run this algorithm and access the page hashes after the execution, and the shutdown routine can just hash the pages and assert that the page hashes in write-once memory are indeed correct.

Make sure only paged-in memory are accessed

The algorithm in the previous section ensures that the same set of pages is written in the bootloader and used to update the memory hash. What's left to do is to prove that no other page is accessed during execution.

This could be implemented by adding a special mstore instruction only used by the bootloader, which sets a special flag to 1 in the memory machine (while the "normal" mstore instruction sets it to 0). Then, we can add a constraint in the memory machine that the first row of each address needs to have the flag set to 1.

Thinking about how to do the Merkle update proofs in the Bootloader. I see two possible approaches.

Setting

We have a Merkle tree of memory pages. The prover can specify the values of an arbitrary subset of those pages (which have to be proved to be in the tree) and hashes of the same pages at the end of the execution (which is validated by the shutdown routine). We want to show that updating the pages leads to a specific new Merkle hash.

Approach 1: Multi-proofs

If we adapt multi-proofs for the initial subset, we can use the same proof to compute the updated Merkle root, by plugging in the updated leaf hashes. This is an implementation of the verification algorithm.

Advantages:

Smaller proofs (although I'm not sure if verification time is better, as the verification algorithm might not be as amenable to things like loop unrolling)

Disadvantages:

The verification algorithm is significantly more complicated than that of vanilla Merkle proofs, and it would have to be implemented in assembly (and without accessing memory!).

Approach 2: Vanilla Merkle Proofs + chain of root hashes

The algorithm would be as follows:

Set current_root_hash to the initial root hash (with all pages at the start of the execution)
For each page:
- Verify that the page is in the Merkle tree summarized by current_root_hash (doing a standard Merkle tree verification)
- Use the same proof and claimed updated leaf hash to compute the updated root hash
- Set current_root_hash to the updated root hash
Assert that current_root_hash is equal to the final root hash (with all pages updated)

This algorithm would be a thin wrapper around standard Merkle tree verification, which is already implemented in assembly. The prover would have to provide Merkle proofs for each page, but using the Merkle tree that has all previous pages updated already.

Conclusion

Approach 2 sounds simpler to me, so I think I'll try implementing that first.

powdr-labs / powdr

Sound Continuations #814