powdr-labs / powdr

A modular stack for zkVMs, with a focus on productivity, security and performance.
Apache License 2.0
354 stars 72 forks source link

Simple Storage State Machine #707

Open georgwiese opened 8 months ago

georgwiese commented 8 months ago

With issue collects some things that we need to be able to access some external storage. For concreteness and simplicity, let's assume we only need to read from an external storage that is summarized as a dense Merkle tree.

See Polygon's Storage Machine for an example of a more complex example that supports writes and uses a sparse Merkle tree.

Machine interface

A read operation would have the following interface:

Rough algorithm

The machine would receive the following non-deterministic inputs:

Then, the machine would:

For a dense Merkle tree of fixed depth, this could actually be implemented as a block machine as follows (modulo of-by-one errors...): Latch Key Bit Key Value Sibling Hash
0 0 0 <value> <sibling 1> <hash 1>
0 1 10 <value> <sibling 2> <hash 2>
... ... ... ... ... ...
0 1 1...10 <value> <sibling n - 1> <hash n -1>
1 1 11...10 <value> <sibling n> <hash n>

where:

For more complex tree structures (e.g. a sparse tree), it might be easier to implement this as a VM (similar to Polygon's Storage machine).

Querying non-deterministic inputs

There would be several ways to implement this. This section explores a few options.

Using Queries & the input API

We already have queries on the PIL level. (Note: This is currently not implemented for block machines, but that could be easily fixed.)

So a simple approach would be something like this:

col witness value(i) query ("input", (i // N) * (N + 1));
col witness sibling(i) query ("input", (i // N) * (N + 1) + 1 + (i % N));

Here, i // N computes the index of the block and N + 1 is the number of non-deterministic inputs needed in each block.

When running the computation, the prover has to compute all non-deterministic inputs in advance and provide them via the --input flag.

Using external witness generation

705 implements a way for the prover to provide some columns. This could be used for the prover to provide the value and sibling columns.

More advanced query mechanisms

The downside of the two previous approaches is that it requires the prover to simulate the entire computation in advance in order to know which non-determinist inputs to provide. Ideally, the prover could provide a callback function that would be called by the our witness generation. This could look like this:

col witness value(i) query match latch(i) { 1 => ("callback.value", key(i)) };
col witness sibling(i) query ("callback.sibling", key(i));

callback.value and callback.sibling are user-provided functions that are evaluated on demand and can have access to some data base (which in this case stores the entire Merkle tree).

Note that in this example, the callbacks don't actually have access to the Merkle root. It would not be trivial to implement, because the Merkle root is only available in the Latch row. To also implement update operations (which change the Merkle root), we'd need the callbacks to be stateful and assume and they are always called with the most recent Merkle root (which should be enforced by other constraints anyway).

leonardoalt commented 3 months ago

@georgwiese you had a PR with a prototype for this, right?

georgwiese commented 3 months ago

Hmm, I'd say the closest to this is the bootloader for continuations. It implements the Merkle tree validation in software, rather than hardware as proposed here. I think this is fine, as it doesn't need too many rows. Also, for more complicated Merkle trees (e.g. the sparse one Polygon uses for their state), a hardware implementation would be tricky because of many data-dependent branches.

For continuation, we also implement updates to the Merkle tree! This works by using a read-only memory with prover-provided content. This allows us to make sure the prover provides the same value at different steps in the computation.

The downside of the two previous approaches is that it requires the prover to simulate the entire computation in advance in order to know which non-determinist inputs to provide.

That is not solved (for continuations, the prover has to simulate the entire execution to know which pages are going to be requested and how they are updated). But I think the query mechanism we have now is powerful enough to let us do that? I think the prover-provided value would still be written to the write-once memory so that we can force the prover to provide the same value multiple times.

leonardoalt commented 3 months ago

I agree that it feels like we can just have this in asm and it's good enough?

georgwiese commented 3 months ago

Yeah, depends on the use-case (like how often we access storage) and on whether if it's OK that our implementation is specific to an instruction set!