Open georgwiese opened 8 months ago
@georgwiese you had a PR with a prototype for this, right?
Hmm, I'd say the closest to this is the bootloader for continuations. It implements the Merkle tree validation in software, rather than hardware as proposed here. I think this is fine, as it doesn't need too many rows. Also, for more complicated Merkle trees (e.g. the sparse one Polygon uses for their state), a hardware implementation would be tricky because of many data-dependent branches.
For continuation, we also implement updates to the Merkle tree! This works by using a read-only memory with prover-provided content. This allows us to make sure the prover provides the same value at different steps in the computation.
The downside of the two previous approaches is that it requires the prover to simulate the entire computation in advance in order to know which non-determinist inputs to provide.
That is not solved (for continuations, the prover has to simulate the entire execution to know which pages are going to be requested and how they are updated). But I think the query mechanism we have now is powerful enough to let us do that? I think the prover-provided value would still be written to the write-once memory so that we can force the prover to provide the same value multiple times.
I agree that it feels like we can just have this in asm and it's good enough?
Yeah, depends on the use-case (like how often we access storage) and on whether if it's OK that our implementation is specific to an instruction set!
With issue collects some things that we need to be able to access some external storage. For concreteness and simplicity, let's assume we only need to read from an external storage that is summarized as a dense Merkle tree.
See Polygon's Storage Machine for an example of a more complex example that supports writes and uses a sparse Merkle tree.
Machine interface
A read operation would have the following interface:
Rough algorithm
The machine would receive the following non-deterministic inputs:
Then, the machine would:
0
0
0
<value>
<sibling 1>
<hash 1>
0
1
10
<value>
<sibling 2>
<hash 2>
0
1
1...10
<value>
<sibling n - 1>
<hash n -1>
1
1
11...10
<value>
<sibling n>
<hash n>
where:
hash' = latch * poseidon(value') + (1 - latch) * (1 - key_bit) * poseidon(hash, sibling) + (1 - latch) * key_bit * poseidon(sibling, hash)
This value should be equal to the Merkle root input in the latch row.For more complex tree structures (e.g. a sparse tree), it might be easier to implement this as a VM (similar to Polygon's Storage machine).
Querying non-deterministic inputs
There would be several ways to implement this. This section explores a few options.
Using Queries & the input API
We already have queries on the PIL level. (Note: This is currently not implemented for block machines, but that could be easily fixed.)
So a simple approach would be something like this:
Here,
i // N
computes the index of the block andN + 1
is the number of non-deterministic inputs needed in each block.When running the computation, the prover has to compute all non-deterministic inputs in advance and provide them via the
--input
flag.Using external witness generation
705 implements a way for the prover to provide some columns. This could be used for the prover to provide the
value
andsibling
columns.More advanced query mechanisms
The downside of the two previous approaches is that it requires the prover to simulate the entire computation in advance in order to know which non-determinist inputs to provide. Ideally, the prover could provide a callback function that would be called by the our witness generation. This could look like this:
callback.value
andcallback.sibling
are user-provided functions that are evaluated on demand and can have access to some data base (which in this case stores the entire Merkle tree).Note that in this example, the callbacks don't actually have access to the Merkle root. It would not be trivial to implement, because the Merkle root is only available in the Latch row. To also implement update operations (which change the Merkle root), we'd need the callbacks to be stateful and assume and they are always called with the most recent Merkle root (which should be enforced by other constraints anyway).