scroll-tech / reth

Modular, contributor-friendly and blazing-fast implementation of the Ethereum protocol, in Rust
https://reth.rs/
Apache License 2.0
6 stars 2 forks source link

Scroll `StateRootProvider` implementation #4

Open frisitano opened 3 weeks ago

frisitano commented 3 weeks ago

Describe the feature

Overview

Scroll uses a binary Markle Patricia trie (BMPT) to represent it's state. This is different to the standard MPT that ethereum and reth uses. As such we will need to introduce a new implementation of the DatabaseStateRoot to support scrolls state model. This will also involve modifications to associated types and hashing schemes.

Design

There are a number of data structures, components and algorithms that are relevant in the context of the state root calculation. We will document those which require modification to support the BMPT state root calculation.

Introduction of StateTypes

To enable configurability of types required for modified state root calculation we will follow a similar pattern to what is used for EngineTypes, as such we will introduce a trait for StateTypes as follows:

pub trait StateTypes {
    /// The state root type.
    type StateRoot<'a, TX: DbTx + 'a>: DatabaseStateRoot<'a, TX>;
    /// The storage root type.
    type StorageRoot<'a, TX: DbTx + 'a>: DatabaseStorageRoot<'a, TX>;
    /// The state proof type.
    type StateProof<'a, TX: DbTx + 'a>: DatabaseProof<'a, TX>;
    /// The state witness type.
    type StateWitness<'a, TX: DbTx + 'a>: DatabaseTrieWitness<'a, TX>;
    /// The key hasher type.
    type KeyHasher: reth_trie::KeyHasher;
}

The StateKeyHasher trait can be defined as follows:

/// Trait for hashing state keys.
pub trait StateKeyHasher: Clone + Sync + Send + Default {
    /// Hashes the given bytes into a 256-bit hash.
    fn hash_key<T: AsRef<[u8]>>(bytes: T) -> B256;
}

As example of how this would be implemented for ethereum would be:

pub struct EthereumStateTypes;

impl StateTypes for EthereumStateTypes {
    type StateRoot = StateRoot;
    ...
    type StorageRoot = StorageRoot;
    type KeyHasher = KeccakKeyHasher;
}

Where the StateRoot and StorageRoot correspond to the types linked and KeccakKeyHasher is a type that will use the default keccak256 function from alloy.

We can then integrate this with NodeTypes by introducing a NodeTypesWithState trait and implementing it following the same pattern used for Engine types as seen here. This pattern will allow for configurability of the state root calculation by specifying the appropriate StateTypes object when configuring the node.

HashedPostState and HashedStorage

The HashedPostState and HashedStorage store the state diff associated with a block (or a number of blocks in the case of pipelining). By default both of these data structures leverage keccak256 to hash keys associated with accounts and storage slots. I propose that we introduce a generic on these data structures for key hashing.

We will make the following methods on HashedPostState and HashedStorage generic over the StateKeyHasher as follows:

HashedPostState::from_bundle_state<'a, K: StateKeyHasher>(...) -> Self
HashedPostState::from_cache_state<'a, KH: StateKeyHasher>(...) -> Self
HashedStorage::from_plain_storage<'a, KH: StateKeyHasher>(...) -> Self

Providers

The StateRootProvider and the StorageRootProvider will be made generic over the StateTypes trait. Parent types will also be impacted and appropriate changes will need to be made.

ParallelStateRoot

The ParallelStateRoot type leverages the StorageRoot and has an implementation of the StateRoot. The storage roots are computed in parallel and then integrated into the state root calculation. I suggest we modify the DatabaseStateRoot interface to allow for pre-computed storage roots to be passed in. As such we remove the bespoke storage root implementation in the ParallelStateRoot object and it just becomes a driver for parallel storage root and state root computation.

BranchNodeCompact

We should be able to reuse the BranchNodeCompact data structure to store our merkle nodes in the database. As scroll uses a binary merkle trie structure only two children would be expected in a branch node.

Stack based BMPT root algorithm

We may want to implement a stack based bmpt root algorithm that follows a similar pattern to what reth natively does for the standard Patricia Merkle trie. More research is required to understand the implications here.

Thegaram commented 3 weeks ago

As I recall you mentioned another alternative ("deep integration with mdbx") that we might want to do in the future. Could you add some more context about that here?

clabby commented 2 weeks ago

This is really cool! Hope to see some of the abstractions that come out of this work land upstream for other extensions of reth to use.

Have you guys looked into using liburkel here? It could potentially be a nice alternative to MDBX when working with base2 radix tries.

frisitano commented 2 weeks ago

This is really cool! Hope to see some of the abstractions that come out of this work land upstream for other extensions of reth to use.

For sure, thats the plan! It would be great to get a review / feedback from potential consumers of the abstractions as development progresses. I can keep you updated if that's ok with you?

Have you guys looked into using liburkel here? It could potentially be a nice alternative to MDBX when working with base2 radix tries.

I've not looked into it but it certainly seems interesting! There's also some promising research being done with nomt (blog post), it would be good to explore potential integrations with reth and benchmark to compare performance.

frisitano commented 5 days ago

Stack-Based BMPT State Root Calculation

Our ambition is to provide an implementation of scrolls BMPT within reth. We intend to leverage as much pre-existing architecture from scrolls default MPT implementation as possible to minimize the scope of this change. Leveraging the pre-existing reth types will result in a less efficient state size however the simplicity and practicality of this approach outweigh the cons.

HashBuilder

The algorithm that is used to build the state root and associated TrieUpdates is located within the HashBuilder object in alloy-trie.

We need to implement an analogous object with an analogous interface that implements an algorithm that builds a BMPT as defined in the scroll zk_trie spec here. We can name this object BmptHashBuilder or something similar to represent that the object builds a binary Merkle Patricia trie.

StateRoot

https://github.com/scroll-tech/reth/blob/4b1567a55f9986b2c4fe6d9e47e2ea0a93f10f31/crates/trie/trie/src/trie.rs#L21-L37 We can leverage the pre-existing StateRoot object which is responsible for iterating over nodes and providing them to the HashBuilder for state root calculation. In the first instance, we should create a code copy of the objects but in the future, we can consider introducing abstractions by the introduction of a generic over StateCommitmentHashBuilder trait defined below on the upstream StateRoot object.

StorageRoot

https://github.com/scroll-tech/reth/blob/4b1567a55f9986b2c4fe6d9e47e2ea0a93f10f31/crates/trie/trie/src/trie.rs#L285-L299 We will take a similar approach as described above with the StateRoot and then in the future look to introduce the StateCommitmentHashBuilder abstraction.

StateCommitmentHashBuilder

The logic for constructing state commitments is encapsulated within a single object. Currently in reth it is in the HashBuilder referenced above. With the introduction of scroll's version of the BmptHashBuilder we should consider introducing a trait for this object such as the following:

pub trait StateCommitmentHashBuilder: Default {
    fn add_branch(&mut self, key: Nibbles, value: B256, children_are_in_trie: bool);
    fn add_leaf(&mut self, key: Nibbles, value: &[u8]);
    fn root(&self) -> B256;
    fn updates_len(&self) -> usize;
    fn split(self) -> (Self, StorageTrieUpdates);
}

This will allow us to make parent objects generic over this trait:

Opened as a separate issue: https://github.com/scroll-tech/reth/issues/24