paradigmxyz / reth

Modular, contributor-friendly and blazing-fast implementation of the Ethereum protocol, in Rust
https://reth.rs/
Apache License 2.0
3.81k stars 1.07k forks source link

Make `BlockchainProvider2` transaction id methods aware of in-memory state #10181

Open Rjected opened 1 month ago

Rjected commented 1 month ago

Right now these methods: https://github.com/paradigmxyz/reth/blob/a7062066958f45665f41de58d94b1bc5c6d68c57/crates/storage/provider/src/providers/blockchain_provider.rs#L585-L598

just check the database for transactions. Previously this was fine because the database was always up to date with the canonical chain. Now, this is not the case, so these need to also check in-memory state.

This method also needs updating: https://github.com/paradigmxyz/reth/blob/a7062066958f45665f41de58d94b1bc5c6d68c57/crates/storage/provider/src/providers/blockchain_provider.rs#L621-L623

As do these methods: https://github.com/paradigmxyz/reth/blob/a7062066958f45665f41de58d94b1bc5c6d68c57/crates/storage/provider/src/providers/blockchain_provider.rs#L679-L695

mvares commented 1 month ago

hey @Rjected, I'm so excited to see this PR being closed due to their complexity. I want to learn about how works the handling of queries over in-memory state.

Rjected commented 1 month ago

hey @Rjected, I'm so excited to see this PR being closed due to their complexity. I want to learn about how works the handling of queries over in-memory state.

hey! Just assigned you, let me know if you need any help or need a review!

mvares commented 1 month ago

Well, I did my homework. I thought and reach out in this solution:

  1. Iterates over all in-memory blocks
  2. Search in transactions by ID provided
  3. Returns the block

https://github.com/paradigmxyz/reth/blob/a7062066958f45665f41de58d94b1bc5c6d68c57/crates/storage/provider/src/providers/blockchain_provider.rs#L621-L623

cc @Rjected

Rjected commented 1 month ago

Well, I did my homework. I thought and reach out in this solution:

1. Iterates over all in-memory blocks

2. Search in transactions by ID provided

3. Returns the block

https://github.com/paradigmxyz/reth/blob/a7062066958f45665f41de58d94b1bc5c6d68c57/crates/storage/provider/src/providers/blockchain_provider.rs#L621-L623

cc @Rjected

The fundamental problem with these txid methods, is that we don't really keep track of a txid index for in-memory blocks, unlike db blocks. So it would be good to do something like:

if let Some(num) = self.database.transaction_block(id)? {
    return Ok(Some(num))
} else {
    // what do we do here?
}

For the else branch, we should get the id for the latest block on disk, and iterate through in-memory blocks to determine which block the transaction is in. The id here is the tx number

mvares commented 1 month ago

Makes sense. Btw, why in-memory blocks don't works equal db blocks?

Rjected commented 1 month ago

Makes sense. Btw, why in-memory blocks don't works equal db blocks?

wdym? do you mean why do we not keep a txid index for in-memory blocks?

mvares commented 1 month ago

Makes sense. Btw, why in-memory blocks don't works equal db blocks?

wdym? do you mean why do we not keep a txid index for in-memory blocks?

yeah

Rjected commented 1 month ago

Makes sense. Btw, why in-memory blocks don't works equal db blocks?

wdym? do you mean why do we not keep a txid index for in-memory blocks?

yeah

ah, we just don't do it yet, since it is extra complexity for maintaining the in-memory state. If it seems like a bottleneck after measurement, then we may think about introducing an index

mvares commented 1 month ago

hey @Rjected, what do you think about if I unassign this? Sincerely, I still can't make queries like this due to my capacity

mvares commented 1 month ago

I will wait by solution. I want to see how will someone resolve it... Thanks for trust

lakshya-sky commented 1 month ago

Hey @Rjected, I gave it a try and I came up with the impl based on your comment https://github.com/paradigmxyz/reth/issues/10181#issuecomment-2276622516.

    fn transaction_block(&self, id: TxNumber) -> ProviderResult<Option<BlockNumber>> {
        if let Some(num) = self.database.transaction_block(id)? {
            return Ok(Some(num));
        }
        // Get First and Last block number from the in-memory state
        let last_block_number = self.last_block_number()?;
        let first_block_number = last_block_number.saturating_add(1);
        let canonical_block_number = self.canonical_in_memory_state.get_canonical_block_number();

        // Get next_tx_num from the last block stored in the database
        let Some(last_block_body_indice) = self.database.block_body_indices(last_block_number)?
        else {
            return Ok(None);
        };
        let mut next_tx_num = last_block_body_indice.next_tx_num();

        for number in first_block_number..=canonical_block_number {
            let Some(block_state) = self.canonical_in_memory_state.state_by_number(number) else {
                return Ok(None);
            };
            let txs_len = block_state.block().block().body.len();
            next_tx_num += txs_len as u64;
            if id < next_tx_num {
                return Ok(Some(number));
            }
        }
        Ok(None)
    }

I wanted to confirm with you that should I continue?