mimblewimble / grin

Minimal implementation of the Mimblewimble protocol.
https://grin.mw/
Apache License 2.0
5.04k stars 989 forks source link

Merkle tree structure for UTXO set #10

Closed ignopeverell closed 6 years ago

ignopeverell commented 7 years ago

The current Merkle tree implementation is mostly a placeholder and is going to be inadequate to handle the UTXO set tree. The following are highly desirable property:

The MMR [1] [2] algorithm has been proposed and Merklix trees [3] offer interesting possibilities as well (i.e. p2p querying, sharding).

[1] https://github.com/opentimestamps/opentimestamps-server/blob/master/python-opentimestamps/opentimestamps/core/timestamp.py#L324 [2] https://github.com/opentimestamps/opentimestamps-server/blob/master/doc/merkle-mountain-range.md [3] http://www.deadalnix.me/2016/09/29/using-merklix-tree-to-checkpoint-an-utxo-set/

@GarrickOllivander @merope07 and @apoelstra have expressed interest.

merope07 commented 7 years ago

Thanks for finding Merklix, I will study it. Something I see is the prefixes are variable-length, which means that an implementation is likely to be as complex as one of a Patricia tree. Also, I'm uncertain about serialization of Merklix trees, since the article does not go into this, although I think it's easy to do in a malleability-free way since the prefixes force the structure of the tree. Not a big deal, just something to think about.

It appears straightforward to create a sum-Merklix tree which is good.

As for the benefits, we never need deletions or proof-of-absense for standard MW operation. For the purpose of a UTXO set, we can simulate this with a "spent count" on each output which is always 1 or 0, then a proof-of-absense is simply a proof-of-presense where the spent-count reveals whether or not the output has been spent. This isn't exactly a proof-of-absense since it does not allow proving that a UTXO never existed, but it does allow to prove that a UTXO existed then was spent, which is all that's required for fraud proofs.

Both addition and in-place updates are O(ln(n)) for both MMR and Merklix.

~M

ignopeverell commented 7 years ago

One advantage I saw for Merklix trees is the possibility of sharding. Maintaining the UTXO set is not really standard MW (in the Jedusor version at least), but it's really handy when it comes to block cut-through. It should allow us to do full cut-through initial sync without too much worry and I think will also let us discard rangeproofs in the full cut-through horizon (something like head minus 1000 blocks), which is a huge win.

With all that in mind, the ability to shard that UTXO set would be pretty cool :)

apoelstra commented 7 years ago

I think because of https://www.reddit.com/r/Bitcoin/comments/4vub3y/mimblewimble_noninteractive_coinjoin_and_better/d62cux6/ committing to the UTXO set is a necessary extension to the Jedusor scheme (like, there are consensus failures if you don't).

I like the sound of sharding .. though I think if the UTXOs that users store are randomly chosen we should be able to do this with MMRs. Unsure. Maybe @merope07 knows more about this.

Discarding rangeproofs means SPV security of noninflation. I definitely think this should be a supported mode of operation but I don't think we can really get rid of the data for full validators.

merope07 commented 7 years ago

With a MMR, since you never delete data, only update, and updating always requires only the rightmost branch of the tree, I think sharding is fine. Any data you retain, you need to know enough auxilary hashes to compute the root, and these hashes are exactly the ones you need to update the data (e.g. to increment a spent-count).

It's probably more efficient to store and transmit, e.g. "all the outputs whose hashes start with 0x0e" in a Merklix tree, but I don't have a clear idea of by how much.

GarrickOllivander commented 7 years ago

I am probably missing the obvious, but how would MMR support lookup?

merope07 commented 7 years ago

You would need to maintain a mapping from UTXO to its index in the tree.

gellert-grindelwald commented 7 years ago

LMDB is a more suitable embedded-database for fast operations and incredible resiliency to corruption than RocksDB. Not sure if any of the database work has already underway, but it is definitely something worth considering.