lbryio / lbrycrd

The blockchain that provides the digital content namespace for the LBRY protocol
https://lbry.com
MIT License
2.57k stars 178 forks source link

Include non-winning claims in the claimtrie hash #107

Closed lyoshenka closed 5 years ago

lyoshenka commented 6 years ago

Problem

Right now, only the winning claim for each name is included in the hash of the claimtrie. Therefore resolving non-winning claims cannot be validated by SPV clients.

Solution

Include all claims in the claimtrie hash.

BrannonKing commented 6 years ago

We currently hash in the last take-over height as well as the currently active claim. With this change are we needing to hash in take-over heights or valid heights for each claim? Or is the one take-over height sufficient?

lyoshenka commented 6 years ago

pinging @kaykurokawa or @lbrynaut

BrannonKing commented 6 years ago

Clarified goal: allow a 3rd party to prove the existence of a non-winning claim.

The design for this hard fork:

First, a word on the current computation: the claimID == hash(outPoint.hash, outPoint.n). However, the per-leaf hash used in the Merkle tree == hash(child[0], child[1], ..., hash(outPoint.hash, outPoint.n, last takeover height)), where outPoint in this context refers to the current winning claim at that leaf node.

Proposal A: At some particular block number the method CClaimTrieCache::recursiveComputeMerkleHash will change its behavior. The inner hash on the per-leaf computation will change to: hash(last takeover height, claimID[0], claimID[1], etc.).

With that change in the hash computation, getnameproof no longer returns sufficient contextual data to recompute the tree hash. We have two options to address that shortcoming:

  1. getnameproof would be augmented to take an optional claimID. If that parameter is included, an additional layer of hierarchy would be returned for each leaf node -- the other claimIDs and the index of the missing one.
  2. those performing the proof can call getclaimsforname and utilize that data to rebuild the tree. Unfortunately, it would need to be called for each leaf node in the hierarchy. It could potentially add |claim| more RPC calls.

Either way, those using getnameproof would need to be updated. I'm only aware of this usage: https://github.com/lbryio/lbry/blob/ee5be5adc80d4513fbd2fa26881833de3487dd54/lbrynet/wallet/claim_proofs.py#L20 .

Proposal B: same as above except that we would also change to use a binary tree hashing combination for the claimtrie nodes and the claims in the nodes (or just the latter). As I understand it, this is how the transactions are hashed already. See the answers here for an explanation of what I mean: https://bitcoin.stackexchange.com/questions/50674/why-is-the-full-merkle-path-needed-to-verify-a-transaction . I think this would significantly reduce the amount of data that we need to return for getnameproof.

Open questions:

  1. Why do we hash in the last takeover height?
  2. Do we need to include the takeover height or valid height for all claims as part of the hash?
  3. Should I modify the getnameproof output (when the to-be-optional claimID is passed in)?
  4. Do you like the binary tree hashing approach?
  5. Why did we compute the hashes for the claimtree nodes as an appended list rather than a binary tree?
BrannonKing commented 6 years ago

Answers discussed with @kaykurokawa :

  1. We hash the takeover height so that it can be verified.
  2. We should include a valid height as part of our claim hash in the tree. We should also include effective amount, but we don't have that data (yet -- we need a one-time upgrade to get it stored).
  3. Yes, modify getnameproof output.
  4. Yes, use the binary tree hashing approach for both claims and node children.
  5. Eh... MVP.
BrannonKing commented 6 years ago

One of my concerns on this was the ordering issue mentioned in #196 . Essentially, you can't re-sort claims in a node without first computing the effective amount (EA) for those nodes. Unfortuantely the EA isn't persisted. Hence, it's set to random bits right when the claims are loaded from the disk. The method to sort the claims does recompute the value. I'm just going to run on the assumption that the claims were sorted when they were persisted on disk, and that they don't lose their ordering when the data is retrieved from disk.

BrannonKing commented 5 years ago

We've had new requirements on this. It now needs to implement all necessary support for this: https://www.notion.so/lbry/Q-A-on-RPC-results-verification-44888d23efa3475a90eec997f9bf3103 . It also needs to return the claim_sequence field in all claim RPC calls.

BrannonKing commented 5 years ago

Merged via https://github.com/lbryio/lbrycrd/pull/302 and https://github.com/lbryio/lbrycrd/pull/303