Indexed Merkle tree - Githubissues

mitschabaude commented 4 weeks ago

closes https://github.com/o1-labs/o1js/issues/1655

This PR introduces IndexedMerkleMap, an all around better version of MerkleMap. See #1655 for a detailed description, including the API which this PR implements.

In short, there are two motivations to introduce a new Merkle storage primitive:

The old primitives didn't cleanly map to provable types, and are appallingly hard to use in circuits
The old Merkle map is a sparse tree of height 256, which meant a large number of in-circuit hashes.

IndexedMerkleMap uses about 4-8x fewer constraints than MerkleMap when used with height 31 (which supports 1 billion entries). Here are some constraint counts for different operations:

indexed merkle map (get) 461
indexed merkle map (get option) 975
sparse merkle map (get) 4208

indexed merkle map (insert) 1696
indexed merkle map (update) 905
indexed merkle map (set) 1878
sparse merkle map (set) 8160

indexed merkle map (assert included) 460
indexed merkle map (assert not included) 507
indexed merkle map (is included) 508

EDIT: Based on feedback from @dfstio, the API was expanded to be useful for cases where only inclusion of a key (but not the value) is important, see discussion below.

dfstio commented 4 weeks ago

I'm trying to test it now and getting the following error:

 Type 'Provable<Option<Field, bigint>, bigint | undefined> & { fromValue(value: bigint | Field | { isSome: boolean | Bool; value: bigint | Field; } | undefined): Option<...>; from(value?: bigint | ... 1 more ... | undefined): Option<...>; none(): Option<...>; }' is not a constructor function type.

    6     class OptionField extends Option(Field) {}
                                    ~~~~~~~~~~~~~

I've removed Option for testing, but it is strange that when I try to test simple code

import { Option, Field } from "o1js";
class OptionField extends Option(Field) {}

it gives the same error

mitschabaude commented 4 weeks ago

@dfstio I ran into the same and fixed it here: https://github.com/o1-labs/o1js/pull/1666/commits/33c216e69b7c57a895d850831f1ad00b52ddde3b

dfstio commented 4 weeks ago

How am I supposed to call proveInclusion(leaf) in the ZkProgram method? I can pass IndexedMerkleMap only as Unconstrained<IndexedMerkleMap>, but I need to do all the calculations of the proveInclusion(leaf) in the provable code, and there is no witness generation (at least for now).

Filling the map is now 3.5 times faster: for 1000 elements: indexed map fill: 15.550s usual map fill: 54.647s

mitschabaude commented 4 weeks ago

@dfstio

you're not supposed to use proveInclusion() at all. it's a helper method. the user-facing API is get(), set(), update(), insert()
see my last commit for how to use it in provable code: https://github.com/o1-labs/o1js/pull/1666/commits/f173c596bdfdd97e45eb7a897bc672d34ff3c881

mitschabaude commented 4 weeks ago

Filling the map is now 3.5 times faster:

Btw, the set() API is definitely not optimized for filling a map quickly :D

But rather, to prove insertion while keeping the number of constraints low

dfstio commented 4 weeks ago

you're not supposed to use proveInclusion() at all. it's a helper method. the user-facing API is get(), set(), update(), insert()

The problem with get() is that I cannot pass the index to it. When I use proveInclusion() directly, it takes 175 constraints, and get(), as you have mentioned above, takes 1005 constraints.

My statistics for proving the inclusion of the array of 10 key-value pairs in the MerkleMap of 1000 entries with proveInclusion(): IndexedMerkleMap:

map fill: 16.250s
compiled: 2.068s
constraints 1746
prove: 12.787s

MerkleMap:

map fill: 57.924s
compiled: 35.682s
constraints 42071
prove: 23.363s

it's a fantastic decrease in the number of constraints))

mitschabaude commented 4 weeks ago

The problem with get() is that I cannot pass the index to it

Can you elaborate on your use case? Because the index is supposed to be an internal detail. It's supposed to be a key value map, and you access elements by key.

Also I'm worried that if you just use proveInclusion(), you're not proving everything that's needed.

Nonetheless, your comments gave me a good idea: Add a cheaper version of get() which asserts that the key-value pair is included (the current version gracefully handles the case when it's not included, and returns an option)

mitschabaude commented 4 weeks ago

When I use proveInclusion() directly, it takes 175 constraints

Also, this seems wrong, or like you're passing in constants, or using a smaller height.

Proving inclusion means doing at least height - 1 hashes, which for height 31 means at least 30 * 11 = 330 constraints.

dfstio commented 4 weeks ago

Can you elaborate on your use case? Because the index is supposed to be an internal detail. It's supposed to be a key value map, and you access elements by key.

I'm keeping NFT private key-value pairs as MerkleMap. I need to prove that those keys exist and are set to specific values. What I did before was create a new redacted MerkleMap that contains disclosed key values and then prove that the elements are included in both Merkle Maps by creating proofs for each key-value element and then merging them.

Given the new number of constraints, I can put many key-value proofs in one ZkProgram and then consume the proofs for different array sizes in SmartContract using the MerkleTree with verification keys and side-loading verification keys.

I prefer to get indexes in non-provable code and then pass them to the provable code to decrease the number of constraints.

dfstio commented 4 weeks ago

Proving inclusion means doing at least height - 1 hashes, which for height 31 means at least 30 * 11 = 330 constraints.

I'm using height 11. I can put 1024 elements in the IndexedMerkleMap of this height. It is 10* 11 = 110 constraints, plus some checks on Fields add more, for a total of 175.

By the way, even ZkProgram with 0 constraints takes at least 8 seconds to create a proof; for some reason, it is much slower than SmartProgram with the same number of constraints.

mitschabaude commented 4 weeks ago

Thanks for your feedback @dfstio. Based on it, I added a new version of get() which only proves inclusion and made it the default version. The old version, which also handled non-inclusion, is called getOption() now.

Here are the numbers for height 11, it matches with what you wrote:

indexed merkle map (get) 176
indexed merkle map (get option) 405
sparse merkle map (get) 4208

indexed merkle map (insert) 542
indexed merkle map (update) 350
indexed merkle map (set) 758
sparse merkle map (set) 8160

dfstio commented 4 weeks ago

Nonetheless, your comments gave me a good idea: Add a cheaper version of get() which asserts that the key-value pair is included (the current version gracefully handles the case when it's not included, and returns an option)

It would be a great addition. Proving inclusion and exclusion (for nullifiers) are very important operations that should probably be reflected in the IndexedMerkleMap methods.

And given that data is public, I can do myself toJSON () and fromJSON () to/from base64.

dfstio commented 4 weeks ago

The old version, which also handled non-inclusion, is called getOption() now.

Can the version that will only prove exclusion (non-inclusion) take less than 405 constraints, similar to the get() that takes half of it?

mitschabaude commented 4 weeks ago

The old version, which also handled non-inclusion, is called getOption() now.

Can the version that will only prove exclusion (non-inclusion) take less than 405 constraints, similar to the get() that takes half of it?

Great point, done!

indexed merkle map (assert included) 175
indexed merkle map (assert not included) 222
indexed merkle map (is included) 223

dfstio commented 4 weeks ago

Can assertIncluded() return a value for the key?

mitschabaude commented 4 weeks ago

Can assertIncluded() return a value for the key?

the version of assertIncluded() which returns a value already exists - it's get()!

dfstio commented 3 weeks ago

How do we serialize the witness to calculate recursive proofs using many workers?

If I want to create a proof confirming that I've correctly added ten key-values to the IndexedMerkleMap and want to split the calculation between 10 separate workers running in parallel, each calculating the proof to be merged later for one key-value pair, I would need to be able to generate a serializable witness to be passed to the worker. Otherwise, I should serialize the whole map, which would take much longer.

Effectively, for this use case, _computeRoot() should be split to map.getWitness(key) and computeRoot(witness), with the witness being easily serializable.

map.getWitness() should be called in the master worker for all ten key-value pairs, and computeRoot() should be called in the provable code for each of the ten workers. Each worker should not have the map, just a witness.

Example when it is needed: Serializing Proving

mitschabaude commented 3 weeks ago

How do we serialize the witness to calculate recursive proofs using many workers?

Interesting challenge!

mitschabaude commented 3 weeks ago

map.getWitness() should be called in the master worker for all ten key-value pairs, and computeRoot() should be called in the provable code for each of the ten workers. Each worker should not have the map, just a witness.

I thought about this a bit, and think it can be done in a way that is compatible with the current design of the IndexedMerkleTree data structure.

The idea is that the current implementation should work if you don't have the full tree, but just the subset that are touched by your updates. This is quite similar to what you propose, since in the end a collection of Merkle witnesses is also just a subset of the tree.

There are two internal data structures: nodes and sortedLeaves. Both should currently allow pruning to the values you actually need. For nodes, you'd need to store arrays of the same length, but mostly filled with empty slots, not sure how much memory that saves. In the case of sortedLeaves, only having a subset should just work.

So for parallel proving, we could:

Run all tree updates serially, without proving
Take a pruned snapshot every k updates
Send each pruned snapshot to a worker, which will perform a proof of k updates
Merge those proofs in a tree to get a single proof

The nice thing is that circuits can be written exactly as in a normal, serial implementation.

Actually this is extremely close to what Mina does with transaction proofs, where snapshots of the ledger are updated :D

mitschabaude commented 3 weeks ago

Note to self / reviewers: the implementation currently doesn't do proof of updates correctly. The problem is that it doesn't connect the Merkle path for the update with the path previously validated against the old commitment

dfstio commented 3 weeks ago

For nodes, you'd need to store arrays of the same length, but mostly filled with empty slots, not sure how much memory that saves

I've done some testing with MerkleTree to evaluate serialized map size and serialized MerkleTree witness size, and the results are as follows:

random indexes in the tree:
height: 11, elements: 1000, tree size: 75,508 chars, witness size: 495 chars (0.66%)
height: 20, elements: 10000, tree size: 3,243,225 chars, witness size: 860 chars (0.03%)
height: 30, elements: 10000, tree size: 8,159,377 chars, witness size: 1,312 chars (0.02%)
height: 50, elements: 10000, tree size: 18,475,240 chars, witness size: 2,211 chars (0.012%)
height: 100, elements: 100000, tree size: 456,081,607 chars, witness size: 4,507 chars (0.0010%)
height: 255, elements: 25000, tree size: 405,075,106 chars, witness size: 11,600 chars (0.0029%)

ordered indexes in the tree:
height: 11, elements: 1000, tree size: 93,319 chars, witness size: 499 chars (0.53%)
height: 20, elements: 10000, tree size: 941,828 chars, witness size: 910 chars (0.096%)
height: 30, elements: 10000, tree size: 942,558 chars, witness size: 1,367 chars (0.15%)
height: 50, elements: 10000, tree size: 943,572 chars, witness size: 2,283 chars (0.24%)
height: 100, elements: 100000, tree size: 9,526,927 chars, witness size: 4,567 chars (0.048%)
height: 255, elements: 100000, tree size: 9,535,772 chars, witness size: 11,662 chars (0.12%)

The IndexedMerkleMap should be closer to ordered indexes, so by creating a witness or pruned snapshot we can decrease the serialized witness size by circa 1000x. Btw, it also shows the serialized map size savings IndexedMerkleMap will bring: it should be 170x (16k per element in MerkleMap vs 95 bytes per element with IndexedMerkleMap)

Take a pruned snapshot every k updates

We need to take a pruned snapshot several times BEFORE running the circuit without proofs for k updates and make sure that low leaves are also included.

Actually this is extremely close to what Mina does with transaction proofs, where snapshots of the ledger are updated :D

I believe that IndexedMerkleMap is extremely important for rollups on Mina protocol and will save a lot of money in proving costs

o1-labs / o1js

Indexed Merkle tree #1666