georgwiese commented 8 months ago

With issue collects some things that we need to be able to access some external storage. For concreteness and simplicity, let's assume we only need to read from an external storage that is summarized as a dense Merkle tree.

See Polygon's Storage Machine for an example of a more complex example that supports writes and uses a sparse Merkle tree.

Machine interface

A read operation would have the following interface:

Inputs:
- Merkle root: A ~256-Bit hash
- Index: An $n$-bit index, where $n$ is the depth of the Merkle tree
Outputs:
- Value: The value at the provided index

Rough algorithm

The machine would receive the following non-deterministic inputs:

The value at the provided index
The Merkle proof, i.e. the list of sibling nodes along the path from the index to the root

Then, the machine would:

Bit-decompose the index
Re-compute the Merkle root from the provided value and siblings

For a dense Merkle tree of fixed depth, this could actually be implemented as a block machine as follows (modulo of-by-one errors...):	Latch	Key Bit	Key	Value	Sibling
`0`	`0`	`0`	`<value>`	`<sibling 1>`	`<hash 1>`
`0`	`1`	`10`	`<value>`	`<sibling 2>`	`<hash 2>`
...	...	...	...	...	...
`0`	`1`	`1...10`	`<value>`	`<sibling n - 1>`	`<hash n -1>`
`1`	`1`	`11...10`	`<value>`	`<sibling n>`	`<hash n>`

where:

Latch is a constant column
The key bit is a witness column constrained to be bit-valued. Using this global range constraint, our existing witness generation can deduce the values.
They key is computed from the key bits, and constrained to be equal to the input key in the latch row.
The value is non-deterministic input and held constant throughout the entire block.
The siblings are non-deterministic inputs.
The parent hashes are computed as hash' = latch * poseidon(value') + (1 - latch) * (1 - key_bit) * poseidon(hash, sibling) + (1 - latch) * key_bit * poseidon(sibling, hash) This value should be equal to the Merkle root input in the latch row.

For more complex tree structures (e.g. a sparse tree), it might be easier to implement this as a VM (similar to Polygon's Storage machine).

Querying non-deterministic inputs

There would be several ways to implement this. This section explores a few options.

Using Queries & the input API

We already have queries on the PIL level. (Note: This is currently not implemented for block machines, but that could be easily fixed.)

So a simple approach would be something like this:

col witness value(i) query ("input", (i // N) * (N + 1));
col witness sibling(i) query ("input", (i // N) * (N + 1) + 1 + (i % N));

Here, i // N computes the index of the block and N + 1 is the number of non-deterministic inputs needed in each block.

When running the computation, the prover has to compute all non-deterministic inputs in advance and provide them via the --input flag.

Using external witness generation

705 implements a way for the prover to provide some columns. This could be used for the prover to provide the `value` and `sibling` columns.

More advanced query mechanisms

The downside of the two previous approaches is that it requires the prover to simulate the entire computation in advance in order to know which non-determinist inputs to provide. Ideally, the prover could provide a callback function that would be called by the our witness generation. This could look like this:

col witness value(i) query match latch(i) { 1 => ("callback.value", key(i)) };
col witness sibling(i) query ("callback.sibling", key(i));

callback.value and callback.sibling are user-provided functions that are evaluated on demand and can have access to some data base (which in this case stores the entire Merkle tree).

Note that in this example, the callbacks don't actually have access to the Merkle root. It would not be trivial to implement, because the Merkle root is only available in the Latch row. To also implement update operations (which change the Merkle root), we'd need the callbacks to be stateful and assume and they are always called with the most recent Merkle root (which should be enforced by other constraints anyway).

leonardoalt commented 3 months ago

@georgwiese you had a PR with a prototype for this, right?

georgwiese commented 3 months ago

Hmm, I'd say the closest to this is the bootloader for continuations. It implements the Merkle tree validation in software, rather than hardware as proposed here. I think this is fine, as it doesn't need too many rows. Also, for more complicated Merkle trees (e.g. the sparse one Polygon uses for their state), a hardware implementation would be tricky because of many data-dependent branches.

For continuation, we also implement updates to the Merkle tree! This works by using a read-only memory with prover-provided content. This allows us to make sure the prover provides the same value at different steps in the computation.

The downside of the two previous approaches is that it requires the prover to simulate the entire computation in advance in order to know which non-determinist inputs to provide.

That is not solved (for continuations, the prover has to simulate the entire execution to know which pages are going to be requested and how they are updated). But I think the query mechanism we have now is powerful enough to let us do that? I think the prover-provided value would still be written to the write-once memory so that we can force the prover to provide the same value multiple times.

leonardoalt commented 3 months ago

I agree that it feels like we can just have this in asm and it's good enough?

georgwiese commented 3 months ago

Yeah, depends on the use-case (like how often we access storage) and on whether if it's OK that our implementation is specific to an instruction set!

powdr-labs / powdr

Simple Storage State Machine #707

Machine interface

Rough algorithm

Querying non-deterministic inputs

Using Queries & the input API

Using external witness generation

705 implements a way for the prover to provide some columns. This could be used for the prover to provide the `value` and `sibling` columns.

More advanced query mechanisms

powdr-labs / powdr

Simple Storage State Machine #707

Machine interface

Rough algorithm

Querying non-deterministic inputs

Using Queries & the input API

Using external witness generation

705 implements a way for the prover to provide some columns. This could be used for the prover to provide the value and sibling columns.

More advanced query mechanisms

705 implements a way for the prover to provide some columns. This could be used for the prover to provide the `value` and `sibling` columns.