scroll-tech / revm

Rust Ethereum virtual machine (revm) Is EVM written in rust that is focused on speed and simplicity
MIT License
19 stars 7 forks source link

`scroll-revm` SDK Implementation #25

Closed frisitano closed 2 months ago

frisitano commented 2 months ago

Overview

The current implementation of scroll-revm leverages a fork of revm and uses feature flags to override EVM functionality. While this solution is functional there are some concerns around maintainability, merging upstream upgrades becomes challenging and error prone due to large diffs and conflicting code. An alternative approach would be to utilise revm as an SDK and apply the required modifications using it's public API. An example of this can be seen for an optimism implementation here.

Motivation

Implementation Considerations

There are a number of differences between native revm and scroll-revm which we will need to consider when redesigning the implementation using the SDK like pattern.

AccountInfo

In the scroll fork we have modification of the AccountInfo struct which includes two additional fields, code_size and poseiden_code_hash, which are enabled via feature flags.

pub struct AccountInfo {
    /// Account balance.
    pub balance: U256,
    /// Account nonce.
    pub nonce: u64,
    #[cfg(feature = "scroll")]
    /// code size,
    pub code_size: usize,
    /// code hash,
    pub code_hash: B256,
    #[cfg(feature = "scroll-poseidon-codehash")]
    /// poseidon code hash, won't be calculated if code is not changed.
    pub poseidon_code_hash: B256,
    /// code: if None, `code_by_hash` will be used to fetch it if code needs to be loaded from
    /// inside `revm`.
    pub code: Option<Bytecode>,
}

There are two potential approaches we should consider such that we do not have to modify the AccountInfo struct.

Introduce ScrollAccountInfo trait

The first approach is to implement a trait of the following form on AccountInfo as follows.

/// A trait that provides properties for scroll account info.
pub trait ScrollAccountInfo {
    /// Returns the size of the code.
    fn code_size(&self, ) -> usize;

    /// Returns the poseiden code hash.
    fn poseidon_code_hash(&self) -> B256;
}

impl ScrollAccountInfo for AccountInfo {
    fn code_size(&self) -> usize {
        self.code.as_ref().map(|code| code.len()).unwrap_or(0)
    }

    fn poseidon_code_hash(&self) -> B256 {
        self.code
            .as_ref()
            .map(|code| code.poseidon_hash_slow())
            .unwrap_or(B256::ZERO)
    }
}

One of the concerns here is the computation overhead of computing the poseiden hashing of code for every invocation of the posiden_code_hash(..) method. To address this, we could introduce a ScrollAccountInfoProvider which could do database lookups for the associated code properties. This would look something like the following:

/// A trait that provides properties for scroll account info.
trait ScrollAccountInfoProviderT {
    /// Returns the size of the code associated with the keccak code hash.
    fn code_size(&mut self, keccak_code_hash: B256) -> usize;

    /// Returns the poseiden code hash associated with the keccak code hash.
    fn poseidon_code_hash(&mut self, keccak_code_hash: B256) -> B256;
}

type KeccakCodeHash = B256;
type PoseidonCodeHash = B256;
type CodeSize = usize;

pub struct ScrollAccountInfoProvider {
    db: ScrollAcountInfoDB,
    cache: HashMap<KeccakCodeHash, (CodeSize, KeccakCodeHash)>,
}

impl ScrollAccountInfoProviderT for ScrollAccountInfoProvider {
    fn code_size(&mut self, keccak_code_hash: B256) -> usize {
        if let Some((code_size, _)) = self.cache.get(&keccak_code_hash) {
            return *code_size;
        }

        let code_size = self.db.get_code_size(keccak_code_hash);
        self.cache
            .insert(keccak_code_hash, (code_size, keccak_code_hash));
        code_size
    }

    fn poseidon_code_hash(&mut self, keccak_code_hash: B256) -> B256 {
        if let Some((_, poseidon_code_hash)) = self.cache.get(&keccak_code_hash) {
            return *poseidon_code_hash;
        }

        let poseidon_code_hash = self.db.get_poseidon_code_hash(keccak_code_hash);
        self.cache.insert(keccak_code_hash, (0, poseidon_code_hash));
        poseidon_code_hash
    }
}

/// A trait that provides properties for scroll account info.
pub trait ScrollAccountInfo {
    /// Returns the size of the code.
    fn code_size(&self, provider: &mut impl ScrollAccountInfoProviderT) -> usize;

    /// Returns the poseiden code hash.
    fn poseidon_code_hash(&self, provider: &mut impl ScrollAccountInfoProviderT) -> B256;
}

impl ScrollAccountInfo for AccountInfo {
    fn code_size(&self, provider: &mut impl ScrollAccountInfoProviderT) -> usize {
        provider.code_size(self.code_hash)
    }

    fn poseidon_code_hash(&self, provider: &mut impl ScrollAccountInfoProviderT) -> B256 {
        provider.poseidon_code_hash(self.code_hash)
    }
}

We would then inject the ScrollAcountInfoProvider object via revm's external context seen here.

My preference would be to implement the naive solution in which we recompute the poseiden hash and code size on every invocation in the first instance and then open an issue to implement the caching solution in the future as part of a performance optimisation sprint.

Introduce AccountInfo trait in EvmWiring trait upstream

Upstream revm allows for abstraction of certain types such as Transaction and Block allowing the client to specify the concrete types which will be used. We could extend this pattern for AccountInfo allowing us to specify a concrete AccountInfo type that includes the code size and poseiden hash fields. This would look something like this:

pub trait EvmWiring: Sized {
    /// External context type
    type ExternalContext: Sized;

    /// Chain context type.
    type ChainContext: Sized + Default + Debug;

    /// Database type.
    type Database: Database;

    /// The type that contains the account information.
    type AccountInfo: AccountInfo;

    /// The type that contains all block information.
    type Block: Block;

    /// The type that contains all transaction information.
    type Transaction: Transaction + TransactionValidation;

    /// The type that enumerates the chain's hardforks.
    type Hardfork: HardforkTrait;

    /// Halt reason type.
    type HaltReason: HaltReasonTrait;
}

This would be a clean solution however the surface area of this change would be pretty large requiring significant changes to upstream and buy in from revm maintainers.

Conclusion

I would propose that we go for the naive ScrollAccountInfo trait in the first instance and then look to optimise refactor this in the future as we have a better understanding of performance and more familiarity with the revm codebase.

Opcodes

There are a number of opcodes that differ between scroll-revm and native revm. These can be implemented by overriding the instruction_table on the Handler using the EvmBuilder.

frisitano commented 2 months ago

An initial implementation of the scroll-evm using the SDK like approach has been implemented here. This implementation still needs further testing and reviews.