paradigmxyz / reth

Modular, contributor-friendly and blazing-fast implementation of the Ethereum protocol, in Rust
https://reth.rs/
Apache License 2.0
3.59k stars 938 forks source link

Query just the previous receipt to construct a transaction receipt #5775

Open shekhirin opened 7 months ago

shekhirin commented 7 months ago

Describe the feature

Currently, on building a receipt for just one transaction, we query all receipts from the block.

https://github.com/paradigmxyz/reth/blob/1eaa3ed5a5bcdd67852a05bd3b932316d3a8637d/crates/rpc/rpc/src/eth/api/transactions.rs#L1166-L1176

It's needed for two reasons:

  1. Calculating the cumulative gas usage from the previous receipt in the block. We can solve it by querying just the previous receipt, and not all.
  2. Calculating the log index from the logs of all previous receipts in the block. We can solve it by saving the log index into the database Receipt model.

Additional context

No response

i-m-aditya commented 7 months ago

I will fix this.

shekhirin commented 7 months ago

@i-m-aditya assigned

i-m-aditya commented 7 months ago

@shekhirin I agree that for cumulative gas usage, we don't need all receipts but to fill all fields of TransactionReceipt, we want to get the log index in a receipt (which is the total logs before that tx in the block plus the log index in that receipt).

let mut num_logs = 0;
    for prev_receipt in all_receipts.iter().take(meta.index as usize) {
        num_logs += prev_receipt.logs.len();
    }

    for (tx_log_idx, log) in receipt.logs.into_iter().enumerate() {
        let rpclog = Log {
            address: log.address,
            topics: log.topics,
            data: log.data,
            block_hash: Some(meta.block_hash),
            block_number: Some(U256::from(meta.block_number)),
            transaction_hash: Some(meta.tx_hash),
            transaction_index: Some(U256::from(meta.index)),
            log_index: Some(U256::from(num_logs + tx_log_idx)),
            removed: false,
        };
        res_receipt.logs.push(rpclog);
    }

To avoid querying all receipts, we need to modify the receipt struct and add the cumulative_logs_emitted field which is the total logs emission till the previous receipt. Does it make sense?

shekhirin commented 7 months ago

@i-m-aditya you're right, I added the second reason for querying all receipts and the way we want to solve it.

i-m-aditya commented 6 months ago

@shekhirin Apologies for the delay. Completely missed this issue, picked it today.

Rjected commented 6 months ago

Calculating the log index from the logs of all previous receipts in the block. We can solve it by saving the log index into the database Receipt model.

@shekhirin this would be breaking, right? if there are enough bits left it could be an option at the end of the struct. is the perf benefit worth the change?

shekhirin commented 6 months ago

is the perf benefit worth the change?

We need to benchmark that, not sure. It will add around 8GB of data to the Receipts table.

github-actions[bot] commented 5 months ago

This issue is stale because it has been open for 21 days with no activity.

sevazhidkov commented 5 months ago

Can look into this.

sevazhidkov commented 5 months ago

Quick benchmark: Holesky, EthApi::transaction_receipt, n=10k random txs across the chain, 0.25ms on average (P99 - 0.5ms, P999 - 1.2ms). Loading receipts for the block is roughly 20% of that. IMO definitely not worth implementing the 9GB solution, even if numbers are worse on mainnet (not sure what's reth average mainnet vs holesky performance ratio).

There is an easier solution: reth currently loads all receipts for a block that transaction is in, but only uses meta.index of them:

for prev_receipt in all_receipts.iter().take(meta.index as usize) {
    num_logs += prev_receipt.logs.len();
}

So, it's possible to create a new method ReceiptProvider::receipts_by_block_with_limit that would only load meta.index first transactions, because only these are required to build the current receipt. The downside of this is that we'll also have to create a separate CacheAction with its own state, because receipts_by_block_with_limit results with different limits can't be cached together (or, at least, we can't use cached version with smaller limit). In this scenario, getting receipt for a transaction will also not warm up the main receipts-by-block cache.

Personally, I think the juice (20% of 0.25ms) is not worth the squeeze (separated cache + code complexity). But if someone feels differently, feel free to push otherwise.

shekhirin commented 5 months ago

@mattsse would love to hear your thoughts on 👆 IMO agreed that 20% of 0.25ms doesn't worth neither 9GB nor the effort

mattsse commented 5 months ago

agree, I think this issue could be solved by changing the DB model so that querying all receipts for the tx receipt is not necessary, which is what you're describing I believe.

so these numbers are very helpful, overall 9gb isn't that much but I'm more concerned about the complexity, so I think we can revisit this eventually.

most recent block receipts will be cached anyway, so this mostly affects arbitrary older txs