Portal State Network - Proof generation & verification

bhartnett commented 10 months ago

The Portal State Network is a DHT based p2p network, one of the sub networks of the Portal Network, which will validate and store the Ethereum world state. This state includes externally owned accounts data and contract accounts. This state is normally stored in a merkle patricia trie data structure where each account/wallet address is a key and the account data is the value. The account data for an externally owned account contains a nonce and balance while the account data for a contract account additionally contains the smart contract evm bytecode and the storage root which is the root of another merkle patricia trie which stores all the smart contract data for that account.

Bridge nodes will inject this state data into the Portal Network and each node participating in the state network DHT will need to validate that the data is valid and authentic before storing it locally. In order to do this, the bridge nodes will need to generate Merkle proofs (sometimes called a witness) which can be used to show that one or more pieces of account state are members of the current canonical Ethereum blockchain. These proofs are validated against a trusted state root value which comes from headers in the beacon chain. Account state data when serialized is normally stored in RLP format but the Merkle proofs will be serialized in SSZ format.

Merkle proofs can be used to prove membership for a single leaf or for multiple leaves in which case we call it a multi-proof. Data is organised in blocks of transactions so it will likely be feed into the portal network in batches so we will want to use multi-proofs to allow updating the state for a block of transactions at once. Transactions represent a state transition from one world state to the next and so in order to allow the state in the portal network to change over time we need to provide the transaction diff or new state for each block. Once we have built a multi-proof, in order to verify it we can then reverse build a partial merkle trie and then check that after applying the hashes up the trie that the result is equal to the trusted state root.

The goal of this task is to implement generation and verification of these proofs which will be used by both bridge nodes and portal network nodes. In the future this code may get re-used as a part of stateless Ethereum which is on the Ethereum roadmap.

To-Do:

[x] Generation of account trie proofs from a state.
[x] Verification of account trie proofs from a state.
[x] Generation of storage trie proofs from a state.
[x] Verification of storage trie proofs from a state.
[x] Test generation + serdes + verification loop.

Basically the full flow of a tests would be:

Start with a full merkle trie
Grab all (or some) leaves and generate their proofs
Encode them in some defined serialization object
Decode the bytes
Verify the proof against the root hash of the merkle tree in step 1
The more complex version, is then to have a state -> create a block -> execute the block -> take only the leaves that were changed, and do the same as above.

Here are some useful links to existing code that might be worth re-using:

Block witness building
- Lower level: there is code in place already: https://github.com/status-im/nimbus-eth1/tree/master/stateless
- Higher level, on execution: https://github.com/status-im/nimbus-eth1/blob/master/nimbus/db/ledger/accounts_cache.nim#L693, nimbus-eth1/nimbus/evm/state.nim
- Line 289 in 657379f
Block witness execution
- Tree building: https://github.com/status-im/nimbus-eth1/blob/master/stateless/tree_from_witness.nim
- proc buildWitness*(vmState: BaseVMState): seq[byte]

bhartnett commented 10 months ago

Here is the old spec on witness: https://github.com/ethereum/portal-network-specs/blob/01a49a8c9bf08121ecde1b9270a6f2f679cb2568/witness.md

bhartnett commented 10 months ago

This part of the spec defines the structure of the witnesses: https://github.com/ethereum/portal-network-specs/blob/master/state-network.md#data-types

It appears that the Witness nodes which are either leaf, branch or extension nodes will be remain in RLP format while the rest of the data structures will be encoded in SSZ format.

Each WitnessNode contains the RLP encoded bytes of a merkle patricia trie node: WitnessNode := ByteList(1024)

Each MPTWitness contains an ordered list of up to 32 WitnessNodes which includes the leaf node which is to be checked for membership against the state root. MPTWitness := List(witness: WitnessNode, max_length=32)

Account Trie Proof

The account trie proof key contains the ethereum address of the account and the state root which can be used to verify the proof. account_trie_proof_key := Container(address: Bytes20, state_root: Bytes32)

The selector is a predetermined byte that tells the receiver of the request how to interpret the following ssz serialized bytes. The content key is made up of the selector pre-pended to the serialized account_trie_proof_key.

selector               := 0x20
content_key            := selector + SSZ.serialize(account_trie_proof_key)

The content returned from requests for account data is a container holding the MPTWitness and a content_id which is the keccak 256 hash of the address:

content                := Container(witness: MPTWitness)
content_id             := keccak(address)

This is just my interpretation based on what I've read in the spec so far.

kdeme commented 10 months ago

Adding another link here for reference: Useful repo with proof generation: https://github.com/morph-dev/young-ethereum

kdeme commented 10 months ago

Additionally, it's probably useful to also look at the existing client implementations of the eth_getProof JSON-RPC call: https://eips.ethereum.org/EIPS/eip-1186 E.g. in geth: https://github.com/ethereum/go-ethereum/blob/81fd1b3cf9c4c4c9f0e06f8bdcbaa8b29c81b052/internal/ethapi/api.go#L678

This would also be a first good use of it to implement in nimbus EL client.

bhartnett commented 10 months ago

Thanks @kdeme, I'll take a look at these shortly.

bhartnett commented 9 months ago

I've created a draft PR which implements verification of block witnesses. This can be used on the portal network side to validate the updated accounts, code and storage slots for each new block before storing it locally. See here: https://github.com/status-im/nimbus-eth1/pull/1958

bhartnett commented 9 months ago

In order to get the block witness data into the portal network we will need some way to transfer the data from Nimbus where it is generated to the portal network. I'm thinking we can probably create a new custom RPC endpoint in Nimbus that returns the block witness for a block by block hash or block number. The Portal Network bridge can poll/call the RPC endpoint and download the block witness for each block as needed and then send it into the portal network in the desired format.

If we go ahead with this design, a few questions come to mind for the changes on the Nimbus side:

Should we store witnesses in the database or generate them on demand?
Should we only generate witnesses for blocks that have been persisted to disk? My understanding is that if there is a fork then Nimbus will hold the state in memory until the fork is resolved. I believe the portal network doesn't need to hold state for forks so it might be best to only support generating witnesses for blocks that have been stored to disk.
If storing the witnesses in the database it might be difficult to generate historic witnesses due to the way the code is currently implemented. It works by collecting a data structure of touched account and storage slots when executing each transaction, then these values are used to build the block witness. This means historic witnesses which were not previously created would need to re-execute the transactions in each block and then store the witness.
If generating witnesses on demand the RPC endpoint might need to re-execute transactions for historic blocks in order to build the witness.

bhartnett commented 9 months ago

I've implemented the new endpoints for returning a block witness in this PR: https://github.com/status-im/nimbus-eth1/pull/1977

This change adds two new custom RPC endpoints which may be used by the Portal Network bridge node to get the changed account state for each new block. It would be impractical to query Nimbus for every account and storage slot for every block so these endpoints allow us to get just the updated state and then feed it into the portal state network. Here are the interfaces for the new endpoints:

proc exp_getWitnessByBlockNumber(blockId: BlockIdentifier, statePostExecution: bool): seq[byte] proc exp_getProofsByBlockNumber(blockId: BlockIdentifier, statePostExecution: bool): seq[ProofResponse] The first endpoint returns a block witness which is a binary format which follows the spec here: https://github.com/ethereum/portal-network-specs/blob/01a49a8c9bf08121ecde1b9270a6f2f679cb2568/witness.md. The second endpoint returns a list of proofs for accounts and storage slots in the same format as the eth_getProof endpoint except it returns a list instead of a single proof.

Both endpoints support returning the state from before or after executing the transactions in the block. Each of these options would be useful in different scenarios. For example, stateless block execution would require getting the block witness data from before execution of the transactions in order to execute the transactions against the witness. For the portal network we will likely want to get the list of proofs from after execution of the block because the bridge will simply be forwarding the proofs into the portal state network and it will want the latest updated state after execution.

The new endpoints are disabled by default and can be enabled by supplying the --rpc-api=exp flag. exp is the new JSON-RPC namespace which has been added for experimental endpoints.

This implementation doesn't yet store the block witnesses. It simply fetches the transactions from the requested block, then re-runs the transactions (without persisting to the db) in order to collect the keys of the updated account and storage state which are used to look up and return the account state from before or after the block execution. We only support returning witnesses/proofs for blocks that have been persisted to disk. I believe this is fine at least for now because the portal network only supports feeding in data from the canonical chain, therefore feeding in block data that may be a from a forks etc probably won't be required. Storage of block witnesses in the database will be coming next perhaps in a separate PR.

status-im / nimbus-eth1

Portal State Network - Proof generation & verification #1934