paradigmxyz / reth

Modular, contributor-friendly and blazing-fast implementation of the Ethereum protocol, in Rust
https://reth.rs/
Apache License 2.0
4.02k stars 1.24k forks source link

Persisten SimulateV1 timeouts: failed to read a value from a database table: read transaction has been timed out (-96000) #12932

Closed s-0xa1fa closed 3 days ago

s-0xa1fa commented 4 days ago

Describe the bug

I'm running into an issue where I get persistent hangs / eventual timeouts when running eth_simulateV1. I believe the issue is if I cancel an previous rpc request, and then rerun the same rpc request. Presumably there might be an internal db lock that doesn't drop if the rpc request in improperly canceled? The issue persists past reth restarts.

An example rpc request params for simulateV1 that cause it: [{"blockStateCalls":[{"blockOverrides":{},"calls":[{"from":"0x539a08d1552128d5db42ab5576ae5e945a34ad3c","to":"0x7a250d5630b4cf539739df2c5dacb4c659f2488d","maxFeePerGas":"0xcc022e4fc","maxPriorityFeePerGas":"0x9502f900","value":"0x0","nonce":55,"input":"0x38ed173900000000000000000000000000000000000000000000000034226fe117a68ec700000000000000000000000000000000000000000163ecd43849cc5b4767103e00000000000000000000000000000000000000000000000000000000000000a0000000000000000000000000539a08d1552128d5db42ab5576ae5e945a34ad3c00000000000000000000000000000000000000000000000000000000674776cb0000000000000000000000000000000000000000000000000000000000000002000000000000000000000000c02aaa39b223fe8d0a0e5c4f27ead9083c756cc200000000000000000000000002e7f808990638e9e67e1f00313037ede2362361","type":2},{"from":"0x539a08d1552128d5db42ab5576ae5e945a34ad3c","to":"0x02e7f808990638e9e67e1f00313037ede2362361","maxFeePerGas":"0xcc022e4fc","maxPriorityFeePerGas":"0x9502f900","value":"0x0","nonce":56,"input":"0x095ea7b30000000000000000000000007a250d5630b4cf539739df2c5dacb4c659f2488dffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff","type":2}]}],"traceTransfers":false,"validation":false,"returnFullTransactions":false},"0x144b9de"]

Note above, where on the few requests this succeeds then subsequently hangs.

Steps to reproduce

Described above. I'm not 100% sure of the exact steps to reproduce but it seems to happen when I cancel the rpc client making a request, then subsequently rerun the rpc request.

I could not find any debug logs that were associated with these requests.

Node logs

Platform(s)

Linux (x86)

What version/commit are you on?

Commit SHA: d5f01036016263fbad6a3af9dc3707b2701adc1d Build Timestamp: 2024-11-04T17:33:06.250985070Z Build Features: jemalloc Build Profile: maxperf

What database version are you on?

2024-11-28T01:05:34.508854Z INFO Initialized tracing, debug log directory: /home/node/.cache/reth/logs/mainnet Current database version: 2 Local database version: 2

Which chain / network are you on?

mainnet

What type of node are you running?

Archive (default)

What prune config do you use, if any?

N/A

If you've built Reth from source, provide the full command you used

cargo build --profile maxperf

Code of Conduct

s-0xa1fa commented 4 days ago

The following logs were associate with an a timeout of simulateV1 with this params payload:

[{"blockStateCalls":[{"blockOverrides":{},"calls":[{"from":"0xa45e4cbcc004f486bf8b52d42e2694574fb72735","to":"0xe592427a0aece92de3edee1f18e0157c05861564","maxFeePerGas":"0x2246f08fc6","maxPriorityFeePerGas":"0x77359400","value":"0x0","nonce":5320,"input":"0x414bf389000000000000000000000000c02aaa39b223fe8d0a0e5c4f27ead9083c756cc2000000000000000000000000556c3cbdca77a7f21afe15b17e644e0e98e64df40000000000000000000000000000000000000000000000000000000000002710000000000000000000000000a45e4cbcc004f486bf8b52d42e2694574fb7273500000000000000000000000000000000000000000000000000000000673515cb0000000000000000000000000000000000000000000000000dde99aa4c6d2880000000000000000000000000000000000000000004f61c861a0b0880000000000000000000000000000000000000000000000000000000000000000000000000","type":2}]}],"traceTransfers":false,"validation":false,"returnFullTransactions":false},"0x14333b4"]

2024-11-28T05:52:32.374530Z DEBUG libmdbx: Transaction lock is already acquired, blocking... txn=118836711940272 backtrace= 0: 1: 2: 3: 4: 5:

2024-11-28T05:52:32.374574Z WARN libmdbx: Long-lived read transaction has been timed out open_duration=300.00006151s 2024-11-28T05:52:32.374580Z DEBUG libmdbx: Transaction lock is already acquired, blocking... txn=118836711940272 backtrace= 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18:

2024-11-28T05:52:32.532142Z WARN storage::db::mdbx: The database read transaction has been open for too long. Backtrace: 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: open_duration=300.157645229s self.txn_id=12920243

s-0xa1fa commented 4 days ago

Logs from a debug build:

0: reth_db::implementation::mdbx::tx::MetricsHandler::new at ./home/node/reth/crates/storage/db/src/implementation/mdbx/tx.rs:202:29 1: reth_db::implementation::mdbx::tx::Tx::new_with_metrics::{{closure}} at ./home/node/reth/crates/storage/db/src/implementation/mdbx/tx.rs:58:31 2: core::option::Option::map at ./rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/option.rs:1105:29 3: reth_db::implementation::mdbx::tx::Tx::new_with_metrics at ./home/node/reth/crates/storage/db/src/implementation/mdbx/tx.rs:56:31 4: ::tx at ./home/node/reth/crates/storage/db/src/implementation/mdbx/mod.rs:178:9 5: <alloc::sync::Arc as reth_db_api::database::Database>::tx at ./home/node/reth/crates/storage/db-api/src/database.rs:59:9 6: reth_provider::providers::database::ProviderFactory::provider at ./home/node/reth/crates/storage/provider/src/providers/database/mod.rs:134:13 7: reth_provider::providers::database::ProviderFactory::history_by_block_hash at ./home/node/reth/crates/storage/provider/src/providers/database/mod.rs:180:30 8: <reth_provider::providers::BlockchainProvider as reth_storage_api::state::StateProviderFactory>::history_by_block_hash at ./home/node/reth/crates/storage/provider/src/providers/mod.rs:646:9 9: reth_storage_api::state::StateProviderFactory::state_by_block_id at ./home/node/reth/crates/storage/storage-api/src/state.rs:129:42 10: reth_rpc_eth_api::helpers::state::LoadState::state_at_block_id at ./home/node/reth/crates/rpc/rpc-eth-api/src/helpers/state.rs:190:9 11: reth_rpc_eth_api::helpers::call::Call::spawn_with_state_at_block::{{closure}} at ./home/node/reth/crates/rpc/rpc-eth-api/src/helpers/call.rs:535:25 12: reth_rpc_eth_api::helpers::blocking_task::SpawnBlocking::spawn_tracing::{{closure}} at ./home/node/reth/crates/rpc/rpc-eth-api/src/helpers/blocking_task.rs:74:58 13: <core::panic::unwind_safe::AssertUnwindSafe as core::ops::function::FnOnce<()>>::call_once at ./rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/panic/unwind_safe.rs:272:9 14: std::panicking::try::do_call at ./rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/panicking.rs:554:40 15: rust_try 16: std::panicking::try at ./rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/panicking.rs:518:19 17: std::panic::catch_unwind at ./rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/panic.rs:345:14 18: reth_tasks::pool::BlockingTaskPool::spawn::{{closure}} at ./home/node/reth/crates/tasks/src/pool.rs:93:35 19: <core::panic::unwind_safe::AssertUnwindSafe as core::ops::function::FnOnce<()>>::call_once at ./rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/panic/unwind_safe.rs:272:9 20: std::panicking::try::do_call at ./rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/panicking.rs:554:40 21: __rust_try 22: std::panicking::try at ./rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/panicking.rs:518:19 23: std::panic::catch_unwind at ./rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/panic.rs:345:14 24: rayon_core::unwind::halt_unwinding at ./home/node/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rayon-core-1.12.1/src/unwind.rs:17:5 25: rayon_core::registry::Registry::catch_unwind at ./home/node/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rayon-core-1.12.1/src/registry.rs:367:27 26: rayon_core::spawn::spawn_job::{{closure}} at ./home/node/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rayon-core-1.12.1/src/spawn/mod.rs:97:13 27: <rayon_core::job::HeapJob as rayon_core::job::Job>::execute at ./home/node/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rayon-core-1.12.1/src/job.rs:169:9 28: rayon_core::job::JobRef::execute at ./home/node/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rayon-core-1.12.1/src/job.rs:64:9 29: rayon_core::registry::WorkerThread::execute at ./home/node/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rayon-core-1.12.1/src/registry.rs:860:9 30: rayon_core::registry::WorkerThread::wait_until_cold at ./home/node/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rayon-core-1.12.1/src/registry.rs:794:21 31: rayon_core::registry::WorkerThread::wait_until at ./home/node/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rayon-core-1.12.1/src/registry.rs:769:13 32: rayon_core::registry::WorkerThread::wait_until_out_of_work at ./home/node/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rayon-core-1.12.1/src/registry.rs:818:9 33: rayon_core::registry::main_loop at ./home/node/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rayon-core-1.12.1/src/registry.rs:923:5 34: rayon_core::registry::ThreadBuilder::run at ./home/node/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rayon-core-1.12.1/src/registry.rs:53:18 35: ::spawn::{{closure}} at ./home/node/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rayon-core-1.12.1/src/registry.rs:98:20 36: std::sys::backtrace::rust_begin_short_backtrace at ./rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/sys/backtrace.rs:154:18 37: std::thread::Builder::spawnunchecked::{{closure}}::{{closure}} at ./rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/thread/mod.rs:522:17 38: <core::panic::unwind_safe::AssertUnwindSafe as core::ops::function::FnOnce<()>>::call_once at ./rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/panic/unwind_safe.rs:272:9 39: std::panicking::try::do_call at ./rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/panicking.rs:554:40 40: __rust_try 41: std::panicking::try at ./rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/panicking.rs:518:19 42: std::panic::catch_unwind at ./rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/panic.rs:345:14 43: std::thread::Builder::spawnunchecked::{{closure}} at ./rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/thread/mod.rs:521:30 44: core::ops::function::FnOnce::call_once{{vtable.shim}} at ./rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/ops/function.rs:250:5 45: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce>::call_once at ./rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/alloc/src/boxed.rs:2231:9 46: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce>::call_once at ./rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/alloc/src/boxed.rs:2231:9 47: std::sys::pal::unix::thread::Thread::new::thread_start at ./rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/sys/pal/unix/thread.rs:105:17 48: 49:

Current backtrace: 0: reth_db::implementation::mdbx::tx::MetricsHandler::log_backtrace_on_long_read_transaction at ./home/node/reth/crates/storage/db/src/implementation/mdbx/tx.rs:246:20 1: <reth_db::implementation::mdbx::tx::MetricsHandler as core::ops::drop::Drop>::drop at ./home/node/reth/crates/storage/db/src/implementation/mdbx/tx.rs:267:13 2: core::ptr::drop_in_place<reth_db::implementation::mdbx::tx::MetricsHandler> at ./rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/ptr/mod.rs:574:1 3: core::ptr::drop_in_place<core::option::Option<reth_db::implementation::mdbx::tx::MetricsHandler>> at ./rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/ptr/mod.rs:574:1 4: core::ptr::drop_in_place<reth_db::implementation::mdbx::tx::Tx> at ./rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/ptr/mod.rs:574:1 5: core::ptr::drop_in_place<reth_provider::providers::state::historical::HistoricalStateProvider<reth_db::implementation::mdbx::tx::Tx>> at ./rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/ptr/mod.rs:574:1 6: core::ptr::drop_in_place<alloc::boxed::Box> at ./rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/ptr/mod.rs:574:1 7: reth_rpc_eth_api::helpers::call::Call::spawn_with_state_at_block::{{closure}} at ./home/node/reth/crates/rpc/rpc-eth-api/src/helpers/call.rs:537:9 8: reth_rpc_eth_api::helpers::blocking_task::SpawnBlocking::spawn_tracing::{{closure}} at ./home/node/reth/crates/rpc/rpc-eth-api/src/helpers/blocking_task.rs:74:58 9: <core::panic::unwind_safe::AssertUnwindSafe as core::ops::function::FnOnce<()>>::call_once at ./rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/panic/unwind_safe.rs:272:9 10: std::panicking::try::do_call at ./rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/panicking.rs:554:40 11: rust_try 12: std::panicking::try at ./rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/panicking.rs:518:19 13: std::panic::catch_unwind at ./rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/panic.rs:345:14 14: reth_tasks::pool::BlockingTaskPool::spawn::{{closure}} at ./home/node/reth/crates/tasks/src/pool.rs:93:35 15: <core::panic::unwind_safe::AssertUnwindSafe as core::ops::function::FnOnce<()>>::call_once at ./rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/panic/unwind_safe.rs:272:9 16: std::panicking::try::do_call at ./rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/panicking.rs:554:40 17: __rust_try 18: std::panicking::try at ./rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/panicking.rs:518:19 19: std::panic::catch_unwind at ./rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/panic.rs:345:14 20: rayon_core::unwind::halt_unwinding at ./home/node/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rayon-core-1.12.1/src/unwind.rs:17:5 21: rayon_core::registry::Registry::catch_unwind at ./home/node/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rayon-core-1.12.1/src/registry.rs:367:27 22: rayon_core::spawn::spawn_job::{{closure}} at ./home/node/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rayon-core-1.12.1/src/spawn/mod.rs:97:13 23: <rayon_core::job::HeapJob as rayon_core::job::Job>::execute at ./home/node/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rayon-core-1.12.1/src/job.rs:169:9 24: rayon_core::job::JobRef::execute at ./home/node/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rayon-core-1.12.1/src/job.rs:64:9 25: rayon_core::registry::WorkerThread::execute at ./home/node/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rayon-core-1.12.1/src/registry.rs:860:9 26: rayon_core::registry::WorkerThread::wait_until_cold at ./home/node/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rayon-core-1.12.1/src/registry.rs:794:21 27: rayon_core::registry::WorkerThread::wait_until at ./home/node/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rayon-core-1.12.1/src/registry.rs:769:13 28: rayon_core::registry::WorkerThread::wait_until_out_of_work at ./home/node/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rayon-core-1.12.1/src/registry.rs:818:9 29: rayon_core::registry::main_loop at ./home/node/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rayon-core-1.12.1/src/registry.rs:923:5 30: rayon_core::registry::ThreadBuilder::run at ./home/node/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rayon-core-1.12.1/src/registry.rs:53:18 31: ::spawn::{{closure}} at ./home/node/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rayon-core-1.12.1/src/registry.rs:98:20 32: std::sys::backtrace::rust_begin_short_backtrace at ./rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/sys/backtrace.rs:154:18 33: std::thread::Builder::spawnunchecked::{{closure}}::{{closure}} at ./rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/thread/mod.rs:522:17 34: <core::panic::unwind_safe::AssertUnwindSafe as core::ops::function::FnOnce<()>>::call_once at ./rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/panic/unwind_safe.rs:272:9 35: std::panicking::try::do_call at ./rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/panicking.rs:554:40 36: __rust_try 37: std::panicking::try at ./rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/panicking.rs:518:19 38: std::panic::catch_unwind at ./rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/panic.rs:345:14 39: std::thread::Builder::spawnunchecked::{{closure}} at ./rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/thread/mod.rs:521:30 40: core::ops::function::FnOnce::call_once{{vtable.shim}} at ./rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/ops/function.rs:250:5 41: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce>::call_once at ./rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/alloc/src/boxed.rs:2231:9 42: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce>::call_once at ./rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/alloc/src/boxed.rs:2231:9 43: std::sys::pal::unix::thread::Thread::new::thread_start at ./rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/sys/pal/unix/thread.rs:105:17 44: 45: open_duration=524.432772358s self.txn_id=12922130

mattsse commented 3 days ago

I assume this is related to state root calc

@klkvr should we disable the stateroot processing here for now?

klkvr commented 3 days ago

hmm, yeah we could skip calculating state root here, though I guess this is technically off-spec as the block hashes would end up incorrect

mattsse commented 3 days ago

yeah, but unfortunately we can't efficiently handle this for rpc so I'd rather disable state root for now because this I assume most use cases don't need the stateroot

s-0xa1fa commented 3 days ago

Ran these rpc requests against that branch, can confirm it fixes the problem.