mirage / irmin

Irmin is a distributed database that follows the same design principles as Git
https://irmin.org
ISC License
1.85k stars 157 forks source link

Support for generating and checking Merkle Proofs #928

Closed Galfour closed 2 years ago

Galfour commented 4 years ago

Preliminary benchmarks on Tezos indicate that most of the time spent in a transaction is spent in hard-disk accesses. Because blockchains are adversarial environments, usual caching strategies are not of use (pricing gas based on the average case would enable DOS attacks by constant triggering the worst cases).

I want to try a solution based on Merkle trees. Where instead of querying the hard-disk, the required information is put in a payload along with its Merkle proof.

I will then need to benchmark it to see if/when it is more efficient than hard-disk access.

samoht commented 4 years ago

Hi, thanks for your interest! A possible design for adding Merkle proofs in Irmin would be:

What operation are you trying to optimise?

Galfour commented 4 years ago

Your prototypical design looks exactly like what I want! A Proof module, facilities to create and check proofs. The interface might be lightly different, as I will almost surely create a helper to not have to pass a hash to check, but that's it.


About optimizations.

Generating the proof can take time, that's not really a problem. However, the generated proofs should be small. as they will be included in blocks (even if they could be dropped by archive nodes). I've been told Irmin's nodes could have up to 256 children. If the first two levels are filled, that's at least 2 256 * (number of bits per hash) per proof. Upside is that this is easily fixed by caching the first levels.

Checking the proof should be fast, The end-goal being that checking proof should be faster than a hard-disk read. If the trees are indeed as large as mentioned, the proofs should be fast. Caching the first two levels might make them even faster.


* I don't have extensive experience with Irmin. Only as a dev on Tezos source code, which is quite abstracted.

samoht commented 4 years ago

Nice to hear that it is what you were looking for :-)

However, the generated proofs should be small. as they will be included in blocks (even if they could be dropped by archive nodes).

I am not sure to understand which proof do you want to store in blocks. Is it for a fixed key in the store? Or for a collection of keys (e.g. all the keys modified by the previous transaction)? Do you plan to use the hash of the proof(s) when computing the block hash or using proofs is just an optimisation?

Galfour commented 4 years ago

We might want to have proofs for all of those. Contract code, storage and big map keys. The idea is to pre-declare in an operation the fields in the storage that you plan to access with their values.

The model that we'd like to have is to have some fields of the storage always in cache (stuff with constant size), and for the other things to be priced following the worst-case (cache miss, hard-disk query). For those other things, we also to be able to asserts their value with a merkle proof so that we don't have to query them from the hard-disk.

samoht commented 3 years ago

See #1120 for a proof of concept implementation for Merkle proofs.

samoht commented 2 years ago

A limited support for Merkle Proofs has been released in Irmin 2.9 (via #1583).

2.10 will contain a better API to generate and check these proofs, as well as support for streaming proofs. I'm closing this issue but you can track progress by looking at the Merkle Proofs tags in the issue tracker.