turbofish-org / merk

High-performance Merkle key/value store
Apache License 2.0
226 stars 36 forks source link

Specify proof format and verification algorithm #9

Closed ethanfrey closed 5 years ago

ethanfrey commented 5 years ago

I was curious about how to validate proofs exported from merk.js and reading the code, I wasn't entirely sure. For the purpose of this issue, I just care about existence proofs for exactly one value (it seems you return general range proofs by default, maybe simpler to focus on one case).

The code seems to be this: https://github.com/nomic-io/merk/blob/master/src/verify.js#L15-L47

This is my understanding:

A child node contains {key, value} (both strings) An inner node contains {left, right, kvHash}

In the case of serialized proofs, I would assume for every inner node, either left or right is a base64 hash of the subtree, and the other one is a reference to the child node leading to the leaf of interest.

So far, so good. The hash for a leaf node also seems quite simple:

function getKvHash ({ key, value }) {
  let input = Buffer.concat([
    VarString.encode(key),
    VarString.encode(value)
  ])
  return ripemd160(sha256(input))
}

varint length prefix the key and the value, concatenate, then hash. Solid.

But... looking at:

function childHash (child) {
  if (child == null) {
    return nullHash
  }
  if (typeof child === 'string') {
    return Buffer.from(child, 'base64')
  }
  if (typeof child === 'object') {
    return getHash(child)
  }
  throw Error('Invalid child node value')
}

function getHash (node) {
  let kvHash = node.kvHash
    ? Buffer.from(node.kvHash, 'base64')
    : getKvHash(node)

  let input = Buffer.concat([
    childHash(node.left),
    childHash(node.right),
    kvHash
  ])
  return ripemd160(sha256(input))
}

It seems that the hash of a child node is not just getKvHash, but getHash. Given left and right are both null for the child node. This ends up like:

kvhash = ripemd160(sha256(key.length || key || value.length || value))

Then one step that treats it like an inner node childHash = ripemd160(sha256([0, 0, ... (40 bytes 0)] || kvhash))

The inner node would be:

innerHash = ripemd160(sha256(child.left || child.right || inner.kvhash))

And then what is inner.kvhash?

I'd love to understand this algorithm well, and be able to validate it in another language (eg. go) as a step towards unifying merkle proofs on the road to ibc

mappum commented 5 years ago

One thing you might be missing is that all nodes have a key and a value, including inner nodes (unlike the golang IAVL tree). This is why we need a separate kvHash for each node, otherwise merkle branches would require key/value data for unrelated nodes in the branch.

So unlike many other tree implementations, inner nodes and leaf nodes have the same format, the only difference is that left and right will be either null or undefined.

ethanfrey commented 5 years ago

Okay, that is interesting. And enlightening.

So, for inner nodes, we are guaranteed left and right to be defined. For child nodes, we are guaranteed left and right to be null or undefined.

Correct?

And we should have {key, value} or kvHash for all (only leaf to be proven must be in {key, value} format)?

mappum commented 5 years ago

So, for inner nodes, we are guaranteed left and right to be defined.

It is also possible to have one be undefined, at the edges of the tree.

For child nodes, we are guaranteed left and right to be null or undefined. And we should have {key, value} or kvHash for all (only leaf to be proven must be in {key, value} format)?

That's correct.

ethanfrey commented 5 years ago

Okay, I made a sample tree

    state.foo = 'bar';
    state.food = 'yummy';
    state.bath = {tub: 'full', room: 'small'};
    state.baz = { x: 123, y: { z: 456 } };

And got this:

root: a5a3a7255ef65deea2ba17b22b3b5b1521628a22

query baz.y:

{ left: 'F4j9DF3/N4rBqoX4JYyW9AdvDAo=',
  right:
   { left: null, right: null, key: '.baz.y', value: '{"z":456}' },
  key: '.baz',
  value: '{"x":123}' }

query food:

{ left:
   { left:
      { left: null,
        right: null,
        key: '.',
        value: '{"foo":"bar","food":"yummy"}' },
     right: null,
     key: '.bath',
     value: '{"room":"small","tub":"full"}' },
  right: 'Z7Rs82itYwVwev3PUp0KFDpX9hc=',
  kvHash: '2RiU70Z1dYW4CHRnRkzZ2YlqJDI=' }

The fact foo and food are in the same leaf is a bit odd. But... it should be easy enough to parse

ethanfrey commented 5 years ago

FYI, I made a repo https://github.com/confio/proofs-merk to try to parse these out.

I would like to figure out how to extend a generic proof format to hold this info: https://github.com/confio/proofs in particular https://github.com/confio/proofs/issues/5

Any collaboration and feedback is more than welcome

mappum commented 5 years ago

It's not an exact spec, but I outlined the new proof format and verification algorithm in this document: https://github.com/nomic-io/merk/blob/develop/docs/algorithms.md

ethanfrey commented 5 years ago

Thanks, that is quite a nice document of the proof format and verification/generation algorithm.

The examples were very useful to understand the use of Parent and Child

mappum commented 5 years ago

Closing this since that document exists, can open specific issues in the future around any proof format questions.