zhangchiqing / merkle-patricia-trie

A simplified golang implementation of Ethereum's Modified Patricia Trie.
MIT License
224 stars 68 forks source link

General Questions #2

Closed Olshansk closed 2 years ago

Olshansk commented 2 years ago

I found this blog on Medium, but saw that it is a port of the README here, so figured that opening an issue with general questions might be more conducive to discussion.


As a light client, which don’t have the access to the full blockchain state like full nodes do, but only the merkle root hash for certain block, how can it trust the result of its account balance returned from a full node?

  1. In practice, do you know anyone who uses light clients as opposed to an RPC infrastructure provider like Infura or Alchemy?

A light client can ask for a merkle root hash of the trie state, and use it to verify the balance of its account. If the trie was updated, even if the updates was to other keys, then the verification would fail.

  1. Provided a client gets rootHash from a full node to run VerifyProof(rootHash, key, proof), and their is another source (e.g. user input) for the key, it would imply that the client needs to run tr.Prove(key) locally, implying it needs to sync the whole trie state to its local storage.

    This seems to counteract the point of only need a merkle root, so I think I’m missing something and was hoping you could shed some light.

OK, but why should the light client trust the merkle root hash?

  1. Worth adding that there is an implicit assumption on how the light client retrieves (via peer-to-peer gossiping & synching) to get a merkle root hash that it can trust. I’m not quite familiar with this process, but it’s an assumption being made.

    Is that fair or am I misunderstanding?

As an example, let’s take at Block

NIT: s/let's take at Block/let’s take a look at Block/g

Merkle Patricia Trie is a data structure that stores key-value pairs, just like a map.

  1. Saying it is just like a map implies O(1) read complexity O(N) memory complexity where N is the number of keys. Looking at the explanation, it seems that the read complexity here is O(M) where M is the size of the key and the memory complexity is O(M * N).

    If I'm not misunderstanding, I can update the README just to make it more explicit for future readers.

zhangchiqing commented 2 years ago

Hi @Olshansk ,

Sorry for the late reply. These are good questions.

  1. In practice, do you know anyone who uses light clients as opposed to an RPC infrastructure provider like Infura or Alchemy?

Infura and Alchemy are just full nodes that we also don't have to trust. For instance, we can request proof for account balance, and verify ourselves. If they return wrong data, then the proof validation would fail.

  1. Provided a client gets rootHash from a full node to run VerifyProof(rootHash, key, proof), and their is another source (e.g. user input) for the key, it would imply that the client needs to run tr.Prove(key) locally, implying it needs to sync the whole trie state to its local storage.

The light client retrieves the proof from a full node. And then runs the Prove(key) locally by constructing a merkle trie and using the rootHash as the key to traverse from the trie root all the way to the leaf node that stores the value of the key. The proof only contains the nodes along path from the trie root to the leaf node. It doesn't need the whole trie.

  1. Worth adding that there is an implicit assumption on how the light client retrieves (via peer-to-peer gossiping & synching) to get a merkle root hash that it can trust. I’m not quite familiar with this process, but it’s an assumption being made.

The merkle root hash also doesn't have to be retrieved via a trusted source. For Proof of work Ethereum, a light client can simply wait for 8 blocks, and validate these block headers. If they are all valid, then these blocks can be trusted, because mostly like no one would waste huge electricity on building 8 blocks on a fork. Since these blocks are valid, the merkle root hash included in these blocks can be trusted, they can be used to validate proofs for storage state.

NIT: s/let's take at Block/let’s take a look at Block/g

Thanks

  1. Saying it is just like a map implies O(1) read complexity O(N) memory complexity where N is the number of keys. Looking at the explanation, it seems that the read complexity here is O(M) where M is the size of the key and the memory complexity is O(M * N).

Right, the functionality of a merkle trie is like a map, where you can get and set key-value pairs. But the time complexity are different.

Complexity depends on the branching factors and hash function. For Ethereum, where the branching factor is 16, meaning each node has 16 children, the time complexity is O(M), where M is the number of nibbles of hash(key). The space complexity is more than O(M*N) because not only it needs to store the leaf nodes, but also the branch nodes and extended nodes in-between the leaf nodes and the root node.

Olshansk commented 1 year ago

Thanks for the detailed response @zhangchiqing! Everything you said makes complete sense.

Also wanted to give @etan-status a shoutout here who did a great presentation on lights clients at https://devcon.org/.

Screen Shot 2022-11-07 at 10 58 40 AM