duplicate commitments and sumtree positions

antiochp commented 6 years ago

Related - #540

Just been thinking through what it means to allow duplicate commitments and how this might interact with sumtree positions.

We don't allow duplicate commitments in the UTXO set but we do allow duplicates if one is spent before the second is created.

I think we have a problem given the following scenario -

Commitment aaaa as last output in block 5 at sumtree pos 50. Commitment aaaa spent (as input) in block 7. Commit aaaa as (duplicate commitment) output in block 9 at pos 72 (overwriting index that pointed to pos 50). We then fork and rewind to block 5, we need to rewind to pos 50 but we no longer have pos 50 in the index.

If we only store the "last" pos of any given commitment. And we permit duplicate commitments. Then, we can get into trouble if we need to rewind to (or beyond?) pos of previous versions of the duplicate commitment.

ignopeverell commented 6 years ago

Mmmh you're right. This is annoying, I need to think some more about that.

sesam commented 6 years ago

So we need to track which commitments are used and/or set a max depth for this tracking, likely also limiting reorg depth the same way to avoid complicating the implementation. Or do you have some tricks up your sleeve?

antiochp commented 6 years ago

@ignopeverell Can we track pos via a stack per commitment (rather than a single pos for each commitment)? i.e. every time we add to the sumtree we push the pos onto the stack maintained for that commitment? Then on reading the pos for a given commitment we get a stack of possible values back - the latest may be "out of bounds" and invalid, so we would pop and look at the next pos?

I haven't thought this through fully - just thinking out loud (apologies if this makes no sense for some obvious reason...) Definitely haven't fully thought through what would happen during a rewind.

ignopeverell commented 6 years ago

It makes sense, I was thinking of something along the same lines, just needs to be a simple array in store (instead of a single element). The additional complexity still bothers me a little but haven't found much better.

It also occurred to me we could simplify the chain/src/sumtrees.rs implementation by having a MMR wrapper that implements the same get, prune, etc. operations but taking commitments instead of positions. It should encapsulate the use of the index a little better.

antiochp commented 6 years ago

Now that we've merged #615 and hopefully fixed the underlying issues causing #559 - I wonder if we can use a related approach to get rid of this "multiple sumtree positions" issue.

We now track the block hash in the wallet for each output. The block hash is required if we spend a coinbase output. But what if we always required a block hash when creating an input - and we used commitment|block_hash to index into the sumtree pos? These would definitely be unique and we would have risk of needing to maintain multiple alternative sumtree positions for a given commitment.

Are there any downsides to tying commitments to blocks in this way? I feel the clean concept of "everything is just a commitment" would be lost like this - but I don't have a good feel yet for how important it is to try and preserve this?

i.e. We would use the block hash in the index, but not necessarily use it in the MMR to generate the hash values.

sesam commented 6 years ago

If block_hash here means the block where this output was created, then for each reorg there would be work to recompute, and SPV-type wallets would need to track where their outputs have moved in order to be able to spend them?

sesam commented 6 years ago

If we both store and look up in near equal amounts, maybe a std::collections::BTreeMap can compete with sumtree + wrapper. Unless the sum field is/will be used.

antiochp commented 6 years ago

I think the sumtrees/MMR is still very much required - its how we store this data efficiently (append only file on disk). And it handles rewinding state to arbitrary points in time. And the sums are used to calculate the roots - which provide the security.

ignopeverell commented 6 years ago

I'm reluctant to do this for 2 reasons:

The block hash isn't enough as we can't rely on having only full historical nodes (it'd defeat the prupose of MW). For non-historical nodes, we need Merkle proofs instead. And those can be large (as in 1-2KB). Requiring them for the occasional time-locked outputs is fine, but for all inputs would become problematic.
It feels like it's tying the protocol to the implementation too much.

antiochp commented 6 years ago

Those points both make a lot of sense. I think you've mentioned begin wary of tying the protocol to the implementation too much before.

sesam commented 6 years ago

Good thinking.

An idea to keep on store, but too early to test out now:

To optimize network message size, possibly also storage, we could special-case the majority of txs which spend only "basic inputs" (no feature flags, no Merkle proofs). The validation would still be features|commitment. Basic spend tacks on an empty features parameter.

For the spend w/ proof case, can that proof checking be done without disk I/O?

antiochp commented 6 years ago

@sesam yeah absolutely. We're actually only optionally including the block_hash today based on the feature flags (coinbase or not). We could definitely go a step further and only include features if they were non-default. And merkle proofs would only be included if we needed them (for coinbase).

In terms of disk I/O - we would need to retrieve the block_header from rocksdb (keyed directly by its hash) but I think this is all we need to do (so a single rocksdb get call, in addition to the existing get_output_pos call).

antiochp commented 6 years ago

Flagging this for testnet2 - looks like this is one of the few outstanding sumtree issues that can break consensus under specific conditions.

ignopeverell commented 6 years ago

Going to pick this up again. We currently maintain the positional index for 2 main uses:

Check for existence of a commitment. This is to ensure the unicity of a kernel or output commitment, or to check that an input is spending an existing UTXO commitment.
Rewind. Only applies to the "last" output/kernel in the index.

The problematic scenario outlined in this issue only applies to the rewind case, the current design serves existence checks fine. And those 2 functionalities have different cardinality and lifecycles, we could get rid of the kernel existence index as mentioned in #758 for example. So I'm proposing to separate them.

Once we separate the rewind case, its implementation is trivial. All we need is a mapping from block hash to last output and kernel positions, which won't exhibit any conflict and won't even require the full block.

mimblewimble / grin

duplicate commitments and sumtree positions #544