paradigmxyz / reth

Modular, contributor-friendly and blazing-fast implementation of the Ethereum protocol, in Rust
https://reth.rs/
Apache License 2.0
3.85k stars 1.1k forks source link

Reth Stalled after coming back online from unclean shut down no errors preseted #8791

Closed argakiig closed 2 months ago

argakiig commented 3 months ago

Describe the bug

The node stops producing new blocks or anything else. the logs become static except for variation in peer count and new head notifications from beacon.

upon further inspection it was determined that the instance this was running on was terminated improperly.

Start a new snapshot on same hardware, same results

Steps to reproduce

  1. start node
  2. have instance node runs on crash or induce some other unclean shut down
  3. when the node comes back up, observe logs becoming static except for peer count or forkchoice updates

Node logs

No response

Platform(s)

Linux (x86)

What version/commit are you on?

reth --version reth Version: 0.2.0-beta.9 Commit SHA: 7b435e0 Build Timestamp: 2024-06-06T21:28:19.976774949Z Build Features: jemalloc Build Profile: maxperf

What database version are you on?

2

Which chain / network are you on?

mainnet eth

What type of node are you running?

Archive (default)

What prune config do you use, if any?

No response

If you've built Reth from source, provide the full command you used

No response

Code of Conduct

argakiig commented 3 months ago

reth-2024-06-12T213903.log attaching debug from node after starting up

argakiig commented 3 months ago

After reverting back to beta 6 I was able to start making progress again albeit rather intermittently at times. I have attached the latest debug logs with jsonrpsee and net logs removed

argakiig commented 3 months ago

debug.log

argakiig commented 3 months ago

and its back in sync with beta 6. I am leaving this open. and I still have a copy of the disk that was running beta 9 that stalled out and would not progress if you want the logs from it as well

argakiig commented 3 months ago

In response to the node upgrade process

1) bump image tag being pulled on chart values
2) push new values to cluster
3) wait for old container to stop and new container to start
4) node starts up and attempts to start syncing, commonly stalling on execution but occasionally making it too one of the merkle stages (variation is due to multiple database snapshots being used, bot externally and internally sourced), eventually the node will shut down, no cause given nothing in the logs, kube reports process exited with error
5) when the node starts back up the next time it makes it to the same point in syncing as it initially stalled at and then refuses to make any further progress

I would think its a kube thing, but I also experienced the same results with a compute VM running the same databases. They all stuck until I reverted back to beta 6

argakiig commented 3 months ago

Dropped out of sync for some reason on beta6 today but it was able to recover

2024-06-24T17:03:30.832683Z DEBUG storage::db::mdbx: Commit total_duration=8.066024888s commit_latency=Some(CommitLatency(MDBX_commit_latency { preparation: 0, gc_wallclock: 530, audit: 0, write: 0, sync: 528062, ending: 14, whole: 528608, gc_cputime: 331, gc_prof: MDBX_commit_latency__bindgen_ty_1 { wloops: 0, coalescences: 0, wipes: 0, flushes: 0, kicks: 0, work_counter: 0, work_rtime_monotonic: 0, work_xtime_cpu: 0, work_rsteps: 0, work_xpages: 0, work_majflt: 0, self_counter: 0, self_rtime_monotonic: 0, self_xtime_cpu: 0, self_rsteps: 0, self_xpages: 0, self_majflt: 0 } })) is_read_only=false
2024-06-24T17:03:30.832727Z DEBUG blockchain_tree: Canonicalization finished actions=[(CloneOldBlocks, 41.44µs), (FindCanonicalHeader, 23.87µs), (SplitChain, 10.87751ms), (SplitChainForks, 5.12µs), (MergeAllChains, 2.2µs), (UpdateCanonicalIndex, 18.9µs), (RetrieveStateTrieUpdates, 4.303194028s), (CommitCanonicalChainToDatabase, 10.903430104s)]
2024-06-24T17:03:30.832813Z DEBUG consensus::engine: Canonicalized new head hash=0x2773940df008e0572c3fe8b1e429f0ddd82f926182058d7b2201f49edc26dd8c number=20162878
2024-06-24T17:03:30.833011Z  INFO reth_node_events::node: Canonical chain committed number=20162878 hash=0x2773940df008e0572c3fe8b1e429f0ddd82f926182058d7b2201f49edc26dd8c elapsed=15.217684412s
2024-06-24T17:03:30.836549Z DEBUG blockchain_tree: Inserting block into side chain block_num_hash=(20162880, 0xeb3ce5fce41e6b143563554776687d285a94d0dee1e8889413159acb4428cefb) chain_id=BlockChainId(8755)
2024-06-24T17:03:30.836576Z DEBUG blockchain_tree: Appending block to side chain block_hash=0xeb3ce5fce41e6b143563554776687d285a94d0dee1e8889413159acb4428cefb block_number=20162880
2024-06-24T17:03:31.172078Z DEBUG evm: Execution time evm_transact=330.541427ms apply_state=1.193601ms apply_post_state=1.14881ms merge_transitions=552.65µs receipt_root=1.77098ms
2024-06-24T17:03:31.264840Z DEBUG trie::parallel_state_root: pre-calculating storage roots len=18849
2024-06-24T17:03:31.930491Z DEBUG jsonrpsee-server: Accepting new connection 1/500
2024-06-24T17:03:32.132617Z DEBUG jsonrpsee-server: HTTP serve connection failed hyper::Error(IncompleteMessage)
2024-06-24T17:03:32.133652Z DEBUG jsonrpsee-server: Accepting new connection 1/500
2024-06-24T17:03:34.768564Z DEBUG jsonrpsee-server: Accepting new connection 1/100000
2024-06-24T17:03:34.901686Z DEBUG blockchain_tree::chain: Validated state root number=20162880 hash=0xeb3ce5fce41e6b143563554776687d285a94d0dee1e8889413159acb4428cefb elapsed=3.729577804s
2024-06-24T17:03:34.935021Z  INFO reth_node_events::node: Block added to canonical chain number=20162880 hash=0xeb3ce5fce41e6b143563554776687d285a94d0dee1e8889413159acb4428cefb peers=100 txs=139 mgas=18.193 full=60.6% base_fee=21.48gwei blobs=6 excess_blobs=126 elapsed=4.101209499s
2024-06-24T17:03:34.946145Z DEBUG blockchain_tree: Committing new canonical chain: [(20162879, 0x1a323d55a73583f92b1b6f35afdc3885b595bf3130b69d17957c76c11d4a2329)]
2024-06-24T17:03:34.967935Z DEBUG jsonrpsee-server: Accepting new connection 1/100000
2024-06-24T17:03:34.968815Z DEBUG jsonrpsee-server: Accepting new connection 1/100000
2024-06-24T17:03:34.969921Z DEBUG jsonrpsee-server: Accepting new connection 1/100000
2024-06-24T17:03:34.970668Z DEBUG jsonrpsee-server: Accepting new connection 1/100000
2024-06-24T17:03:34.971389Z DEBUG jsonrpsee-server: Accepting new connection 1/100000
2024-06-24T17:03:34.971992Z DEBUG jsonrpsee-server: Accepting new connection 1/100000
2024-06-24T17:03:35.004878Z DEBUG blockchain_tree: Recomputing state root for insert blocks=[(20162879, 0x1a323d55a73583f92b1b6f35afdc3885b595bf3130b69d17957c76c11d4a2329)]
2024-06-24T17:03:35.885698Z DEBUG jsonrpsee-server: Accepting new connection 1/500
2024-06-24T17:03:37.808266Z DEBUG net::session: timed out outgoing request id=1389 remote_peer_id=0x605f19ad0ce420c1e37182c13c970ad014cf2e4b6c8612097c783075fcf02085ae61a85a9ca66f6126ad4cb32887e153cc11a678decd99bf8678c650e8a24136
2024-06-24T17:03:39.381152Z DEBUG jsonrpsee-server: Accepting new connection 1/500
2024-06-24T17:03:39.381577Z DEBUG jsonrpsee-server: Accepting new connection 1/500
2024-06-24T17:03:39.382120Z DEBUG jsonrpsee-server: Accepting new connection 1/500
2024-06-24T17:03:39.382304Z DEBUG jsonrpsee-server: Accepting new connection 1/500
2024-06-24T17:03:39.382950Z DEBUG jsonrpsee-server: Accepting new connection 1/500
2024-06-24T17:03:39.383454Z DEBUG jsonrpsee-server: Accepting new connection 1/500
2024-06-24T17:03:39.932136Z DEBUG jsonrpsee-server: HTTP serve connection failed hyper::Error(IncompleteMessage)
2024-06-24T17:03:39.932158Z DEBUG jsonrpsee-server: Accepting new connection 1/500
2024-06-24T17:03:39.954294Z DEBUG providers::db: Inserted block block_number=20162879 actions=[(InsertCanonicalHeaders, 62.15µs), (InsertHeaders, 31.25µs), (InsertHeaderNumbers, 29.82µs), (GetParentTD, 1.71µs), (InsertHeaderTerminalDifficulties, 40.63µs), (GetNextTxNum, 9.41µs), (InsertTransactionSenders, 1.070198ms), (InsertTransactions, 5.017877ms), (InsertTransactionHashNumbers, 520.070056ms), (InsertBlockWithdrawals, 36.82µs), (InsertBlockBodyIndices, 33.111µs), (InsertTransactionBlocks, 26.54µs)]
2024-06-24T17:03:43.009997Z DEBUG providers::db: Appended blocks range=20162879..=20162879 actions=[(InsertBlock, 526.711093ms), (InsertState, 363.546826ms), (InsertHashes, 1.516842021s), (InsertHistoryIndices, 1.175205606s), (UpdatePipelineStages, 47.79µs)]
2024-06-24T17:03:46.225845Z DEBUG jsonrpsee-server: Accepting new connection 1/500
2024-06-24T17:03:47.890370Z DEBUG jsonrpsee-server: Accepting new connection 1/500
2024-06-24T17:03:49.467953Z DEBUG jsonrpsee-server: Accepting new connection 1/100000
2024-06-24T17:03:49.468705Z DEBUG jsonrpsee-server: Accepting new connection 1/100000
2024-06-24T17:03:49.469177Z DEBUG jsonrpsee-server: Accepting new connection 1/100000
2024-06-24T17:03:49.469913Z DEBUG jsonrpsee-server: Accepting new connection 1/100000
2024-06-24T17:03:49.493649Z DEBUG storage::db::mdbx: Commit total_duration=6.48359235s commit_latency=Some(CommitLatency(MDBX_commit_latency { preparation: 2, gc_wallclock: 311, audit: 0, write: 0, sync: 424571, ending: 15, whole: 424900, gc_cputime: 267, gc_prof: MDBX_commit_latency__bindgen_ty_1 { wloops: 0, coalescences: 0, wipes: 0, flushes: 0, kicks: 0, work_counter: 0, work_rtime_monotonic: 0, work_xtime_cpu: 0, work_rsteps: 0, work_xpages: 0, work_majflt: 0, self_counter: 0, self_rtime_monotonic: 0, self_xtime_cpu: 0, self_rsteps: 0, self_xpages: 0, self_majflt: 0 } })) is_read_only=false
2024-06-24T17:03:49.493684Z DEBUG blockchain_tree: Canonicalization finished actions=[(CloneOldBlocks, 43.09µs), (FindCanonicalHeader, 16.16µs), (SplitChain, 11.1142ms), (SplitChainForks, 6.43µs), (MergeAllChains, 2.5µs), (UpdateCanonicalIndex, 20.15µs), (RetrieveStateTrieUpdates, 4.481369065s), (CommitCanonicalChainToDatabase, 10.066170606s)]
2024-06-24T17:03:49.493758Z DEBUG consensus::engine: Canonicalized new head hash=0x1a323d55a73583f92b1b6f35afdc3885b595bf3130b69d17957c76c11d4a2329 number=20162879
2024-06-24T17:03:49.493999Z  INFO reth_node_events::node: Canonical chain committed number=20162879 hash=0x1a323d55a73583f92b1b6f35afdc3885b595bf3130b69d17957c76c11d4a2329 elapsed=14.558818531s
2024-06-24T17:03:49.499182Z DEBUG consensus::engine::hooks: Next hook is not ready hook="StaticFile"
2024-06-24T17:03:49.576665Z DEBUG jsonrpsee-server: Accepting new connection 1/500
2024-06-24T17:03:49.582800Z DEBUG blockchain_tree: Inserting block into side chain block_num_hash=(20162881, 0x1b7df91499c8cff458101e777ae5dfb8f235cb4ddf9c9139bdfc4936c9ca79f5) chain_id=BlockChainId(8755)
2024-06-24T17:03:49.582822Z DEBUG blockchain_tree: Appending block to side chain block_hash=0x1b7df91499c8cff458101e777ae5dfb8f235cb4ddf9c9139bdfc4936c9ca79f5 block_number=20162881
2024-06-24T17:03:49.770640Z DEBUG jsonrpsee-server: Accepting new connection 1/100000
2024-06-24T17:03:49.970503Z DEBUG jsonrpsee-server: Accepting new connection 1/100000
2024-06-24T17:03:49.972380Z DEBUG jsonrpsee-server: Accepting new connection 1/100000
2024-06-24T17:03:49.973404Z DEBUG jsonrpsee-server: Accepting new connection 1/100000
2024-06-24T17:03:49.974027Z DEBUG jsonrpsee-server: Accepting new connection 1/100000
2024-06-24T17:03:49.975360Z DEBUG jsonrpsee-server: Accepting new connection 1/100000
2024-06-24T17:03:49.975901Z DEBUG jsonrpsee-server: Accepting new connection 1/100000
2024-06-24T17:03:49.981083Z DEBUG evm: Execution time evm_transact=393.631252ms apply_state=1.323ms apply_post_state=10.73µs merge_transitions=603.72µs receipt_root=995.15µs
2024-06-24T17:03:50.080519Z DEBUG trie::parallel_state_root: pre-calculating storage roots len=18950
2024-06-24T17:03:53.612205Z  INFO reth::cli: Status connected_peers=100 freelist=908767 latest_block=20162879
2024-06-24T17:03:56.419223Z DEBUG jsonrpsee-server: Accepting new connection 1/500
2024-06-24T17:03:57.577801Z DEBUG jsonrpsee-server: HTTP serve connection failed hyper::Error(IncompleteMessage)
2024-06-24T17:03:57.578815Z DEBUG jsonrpsee-server: Accepting new connection 1/500
2024-06-24T17:03:59.074164Z DEBUG jsonrpsee-server: Accepting new connection 1/500
2024-06-24T17:03:59.889294Z DEBUG jsonrpsee-server: Accepting new connection 1/500
2024-06-24T17:04:04.419348Z DEBUG jsonrpsee-server: Accepting new connection 1/500
2024-06-24T17:04:04.419640Z DEBUG jsonrpsee-server: HTTP serve connection failed hyper::Error(IncompleteMessage)
2024-06-24T17:04:04.772903Z DEBUG jsonrpsee-server: Accepting new connection 1/100000
2024-06-24T17:04:04.973904Z DEBUG jsonrpsee-server: Accepting new connection 1/100000
2024-06-24T17:04:07.075594Z DEBUG jsonrpsee-server: HTTP serve connection failed hyper::Error(IncompleteMessage)
2024-06-24T17:04:07.075686Z DEBUG jsonrpsee-server: Accepting new connection 1/500
2024-06-24T17:04:11.888238Z DEBUG jsonrpsee-server: Accepting new connection 1/500
2024-06-24T17:04:15.308439Z DEBUG blockchain_tree::chain: Validated state root number=20162881 hash=0x1b7df91499c8cff458101e777ae5dfb8f235cb4ddf9c9139bdfc4936c9ca79f5 elapsed=25.327332788s
2024-06-24T17:04:15.359007Z  INFO reth_node_events::node: Block added to canonical chain number=20162881 hash=0x1b7df91499c8cff458101e777ae5dfb8f235cb4ddf9c9139bdfc4936c9ca79f5 peers=100 txs=190 mgas=15.258 full=50.9% base_fee=22.05gwei blobs=6 excess_blobs=129 elapsed=25.780747932s
2024-06-24T17:04:15.373086Z DEBUG blockchain_tree: Committing new canonical chain: [(20162880, 0xeb3ce5fce41e6b143563554776687d285a94d0dee1e8889413159acb4428cefb)]
2024-06-24T17:04:15.435162Z DEBUG blockchain_tree: Recomputing state root for insert blocks=[(20162880, 0xeb3ce5fce41e6b143563554776687d285a94d0dee1e8889413159acb4428cefb)]
2024-06-24T17:04:18.581825Z  INFO reth::cli: Status connected_peers=100 freelist=908767 latest_block=20162879
2024-06-24T17:04:19.775681Z DEBUG jsonrpsee-server: Accepting new connection 1/100000
2024-06-24T17:04:19.976010Z DEBUG jsonrpsee-server: Accepting new connection 1/100000
2024-06-24T17:04:23.889255Z DEBUG jsonrpsee-server: Accepting new connection 1/500
2024-06-24T17:04:24.860446Z DEBUG jsonrpsee-server: Accepting new connection 1/100000
2024-06-24T17:04:24.862026Z DEBUG jsonrpsee-server: Accepting new connection 1/100000
2024-06-24T17:04:24.863474Z DEBUG jsonrpsee-server: Accepting new connection 1/100000
2024-06-24T17:04:24.866726Z DEBUG jsonrpsee-server: Accepting new connection 1/100000
2024-06-24T17:04:24.871101Z DEBUG jsonrpsee-server: Accepting new connection 1/100000
2024-06-24T17:04:24.873689Z DEBUG jsonrpsee-server: Accepting new connection 1/100000
2024-06-24T17:04:24.874245Z DEBUG jsonrpsee-server: Accepting new connection 1/100000
2024-06-24T17:04:24.874818Z DEBUG jsonrpsee-server: Accepting new connection 1/100000
2024-06-24T17:04:24.875411Z DEBUG jsonrpsee-server: Accepting new connection 1/100000
2024-06-24T17:04:24.876833Z DEBUG jsonrpsee-server: Accepting new connection 1/100000
2024-06-24T17:04:24.877786Z DEBUG jsonrpsee-server: Accepting new connection 1/100000
2024-06-24T17:04:24.878991Z DEBUG jsonrpsee-server: Accepting new connection 1/100000
2024-06-24T17:04:24.879277Z DEBUG jsonrpsee-server: Accepting new connection 1/100000
2024-06-24T17:04:24.880596Z DEBUG jsonrpsee-server: Accepting new connection 1/100000
2024-06-24T17:04:24.881293Z DEBUG jsonrpsee-server: Accepting new connection 1/100000
2024-06-24T17:04:28.621408Z DEBUG jsonrpsee-server: Accepting new connection 1/500
2024-06-24T17:04:31.737397Z DEBUG jsonrpsee-server: Accepting new connection 1/500
2024-06-24T17:04:34.780032Z DEBUG jsonrpsee-server: Accepting new connection 1/100000
2024-06-24T17:04:34.978436Z DEBUG jsonrpsee-server: Accepting new connection 1/100000
2024-06-24T17:04:35.891212Z DEBUG jsonrpsee-server: Accepting new connection 1/500

debug log snippet just before it stopped updating for a bit

argakiig commented 3 months ago

this was it then starting to catch back up

2024-06-24T17:07:24.929236Z DEBUG jsonrpsee-server: Accepting new connection 1/100000
2024-06-24T17:07:26.478452Z DEBUG providers::db: Appended blocks range=20162880..=20162880 actions=[(InsertBlock, 295.307426ms), (InsertState, 26.485907084s), (InsertHashes, 55.500157016s), (InsertHistoryIndices, 864.54074ms), (UpdatePipelineStages, 32.01µs)]
2024-06-24T17:07:26.582302Z DEBUG storage::db::mdbx: Commit total_duration=103.820039ms commit_latency=Some(CommitLatency(MDBX_commit_latency { preparation: 1, gc_wallclock: 277, audit: 0, write: 0, sync: 6504, ending: 13, whole: 6796, gc_cputime: 248, gc_prof: MDBX_commit_latency__bindgen_ty_1 { wloops: 0, coalescences: 0, wipes: 0, flushes: 0, kicks: 0, work_counter: 0, work_rtime_monotonic: 0, work_xtime_cpu: 0, work_rsteps: 0, work_xpages: 0, work_majflt: 0, self_counter: 0, self_rtime_monotonic: 0, self_xtime_cpu: 0, self_rsteps: 0, self_xpages: 0, self_majflt: 0 } })) is_read_only=false
2024-06-24T17:07:26.582358Z DEBUG blockchain_tree: Canonicalization finished actions=[(CloneOldBlocks, 44.05µs), (FindCanonicalHeader, 8.22µs), (SplitChain, 10.96486ms), (SplitChainForks, 5.92µs), (MergeAllChains, 2.69µs), (UpdateCanonicalIndex, 20.16µs), (RetrieveStateTrieUpdates, 107.959318939s), (CommitCanonicalChainToDatabase, 83.249954225s)]
2024-06-24T17:07:26.582418Z DEBUG consensus::engine: Canonicalized new head hash=0xeb3ce5fce41e6b143563554776687d285a94d0dee1e8889413159acb4428cefb number=20162880
2024-06-24T17:07:26.582639Z  INFO reth_node_events::node: Canonical chain committed number=20162880 hash=0xeb3ce5fce41e6b143563554776687d285a94d0dee1e8889413159acb4428cefb elapsed=191.220383144s
2024-06-24T17:07:26.621786Z DEBUG pruner: Minimum pruning interval reached previous_tip_block_number=Some(20162874) tip_block_number=20162880
2024-06-24T17:07:26.621862Z DEBUG consensus::engine::hooks: Polled next hook hook="Prune" result=PolledHook { name: "Prune", event: Started, db_access_level: ReadWrite }
2024-06-24T17:07:26.621872Z DEBUG consensus::engine::hooks: Next hook is not ready hook="StaticFile"
2024-06-24T17:07:26.621964Z DEBUG pruner: Pruner started tip_block_number=20162880
2024-06-24T17:07:26.622039Z  INFO reth_node_events::node: Pruner started tip_block_number=20162880
2024-06-24T17:07:26.622114Z DEBUG pruner: Segment pruning started segment=Transactions purpose=StaticFile to_block=20162805 prune_mode=Before(20162806)
2024-06-24T17:07:26.623430Z DEBUG pruner: Segment pruning finished segment=Transactions purpose=StaticFile to_block=20162805 prune_mode=Before(20162806) output.pruned=0
2024-06-24T17:07:26.623473Z DEBUG pruner: Segment pruning started segment=Headers purpose=StaticFile to_block=20162805 prune_mode=Before(20162806)
2024-06-24T17:07:26.642341Z DEBUG pruner: Segment pruning finished segment=Headers purpose=StaticFile to_block=20162805 prune_mode=Before(20162806) output.pruned=0
2024-06-24T17:07:26.642367Z DEBUG pruner: Segment pruning started segment=Receipts purpose=StaticFile to_block=20162805 prune_mode=Before(20162806)
2024-06-24T17:07:26.666998Z DEBUG pruner: Segment pruning finished segment=Receipts purpose=StaticFile to_block=20162805 prune_mode=Before(20162806) output.pruned=0
2024-06-24T17:07:26.667099Z DEBUG storage::db::mdbx: Commit total_duration=68.26µs commit_latency=Some(CommitLatency(MDBX_commit_latency { preparation: 0, gc_wallclock: 0, audit: 0, write: 0, sync: 0, ending: 0, whole: 1, gc_cputime: 0, gc_prof: MDBX_commit_latency__bindgen_ty_1 { wloops: 0, coalescences: 0, wipes: 0, flushes: 0, kicks: 0, work_counter: 0, work_rtime_monotonic: 0, work_xtime_cpu: 0, work_rsteps: 0, work_xpages: 0, work_majflt: 0, self_counter: 0, self_rtime_monotonic: 0, self_xtime_cpu: 0, self_rsteps: 0, self_xpages: 0, self_majflt: 0 } })) is_read_only=false
2024-06-24T17:07:26.667134Z DEBUG pruner: Pruner finished tip_block_number=20162880 elapsed=45.125719ms deleted_entries=0 limiter=PruneLimiter { deleted_entries_limit: Some(PruneDeletedEntriesLimit { limit: 21000, deleted: 0 }), time_limit: Some(PruneTimeLimit { limit: 100ms, start: Instant { tv_sec: 389735, tv_nsec: 409696617 } }) } progress=Finished stats={}
2024-06-24T17:07:26.667228Z  INFO reth_node_events::node: Pruner finished tip_block_number=20162880 elapsed=45.125719ms stats={}
2024-06-24T17:07:26.667230Z DEBUG consensus::engine::hooks: Polled running hook with db write access hook="Prune" result=PolledHook { name: "Prune", event: Finished(Ok(())), db_access_level: ReadWrite }
2024-06-24T17:07:26.672373Z DEBUG consensus::engine::hooks: Next hook is not ready hook="StaticFile"
2024-06-24T17:07:26.712370Z DEBUG jsonrpsee-server: Accepting new connection 1/500
2024-06-24T17:07:26.718843Z DEBUG blockchain_tree: Inserting block into side chain block_num_hash=(20162882, 0x3b364866f0822159f47d32da72f64f6b8f58e5f4c1a9ac77de8ab0426fad6bc5) chain_id=BlockChainId(8755)
2024-06-24T17:07:26.718890Z DEBUG blockchain_tree: Appending block to side chain block_hash=0x3b364866f0822159f47d32da72f64f6b8f58e5f4c1a9ac77de8ab0426fad6bc5 block_number=20162882
2024-06-24T17:07:27.708752Z DEBUG evm: Execution time evm_transact=979.743446ms apply_state=1.573377ms apply_post_state=7.42µs merge_transitions=634.919µs receipt_root=1.811931ms
2024-06-24T17:07:27.826861Z DEBUG trie::parallel_state_root: pre-calculating storage roots len=19055
argakiig commented 3 months ago

possibly hung again

2024-06-24T17:12:48.602701Z  WARN storage::db::mdbx: The database read transaction has been open for too long. Backtrace:
   0: reth_db::implementation::mdbx::tx::MetricsHandler<K>::log_backtrace_on_long_read_transaction
   1: core::ptr::drop_in_place<reth_db::implementation::mdbx::tx::MetricsHandler<reth_libmdbx::transaction::RO>>
   2: core::ptr::drop_in_place<reth_provider::providers::database::provider::DatabaseProvider<reth_db::implementation::mdbx::tx::Tx<reth_libmdbx::transaction::RO>>>
   3: reth_trie_parallel::parallel_root::ParallelStateRoot<DB,Provider>::calculate
   4: reth_blockchain_tree::chain::AppendableChain::validate_and_execute
   5: reth_blockchain_tree::blockchain_tree::BlockchainTree<DB,EVM>::try_insert_validated_block
   6: reth_interfaces::blockchain_tree::BlockchainTreeEngine::insert_block_without_senders
   7: reth_beacon_consensus::engine::BeaconConsensusEngine<DB,BT,Client,EngineT>::on_new_payload
   8: <tracing_futures::Instrumented<T> as core::future::future::Future>::poll
   9: tokio::runtime::task::raw::poll
  10: std::sys_common::backtrace::__rust_begin_short_backtrace
  11: core::ops::function::FnOnce::call_once{{vtable.shim}}
  12: std::sys::pal::unix::thread::Thread::new::thread_start
  13: <unknown>
  14: <unknown>
 open_duration=279.068190261s self.txn_id=13432205
2024-06-24T17:12:48.605884Z DEBUG blockchain_tree::chain: Validated state root number=20162882 hash=0x3b364866f0822159f47d32da72f64f6b8f58e5f4c1a9ac77de8ab0426fad6bc5 elapsed=320.897104155s
2024-06-24T17:12:48.645954Z  INFO reth_node_events::node: Block added to canonical chain number=20162882 hash=0x3b364866f0822159f47d32da72f64f6b8f58e5f4c1a9ac77de8ab0426fad6bc5 peers=100 txs=178 mgas=19.415 full=64.7% base_fee=22.10gwei blobs=0 excess_blobs=132 elapsed=321.931210423s
2024-06-24T17:12:48.659329Z DEBUG blockchain_tree: Committing new canonical chain: [(20162881, 0x1b7df91499c8cff458101e777ae5dfb8f235cb4ddf9c9139bdfc4936c9ca79f5)]
2024-06-24T17:12:48.712461Z DEBUG blockchain_tree: Recomputing state root for insert blocks=[(20162881, 0x1b7df91499c8cff458101e777ae5dfb8f235cb4ddf9c9139bdfc4936c9ca79f5)]
argakiig commented 3 months ago
2024-06-24T17:22:54.878079Z DEBUG providers::db: Appended blocks range=20162881..=20162881 actions=[(InsertBlock, 456.245124ms), (InsertState, 46.059379374s), (InsertHashes, 79.995369812s), (InsertHistoryIndices, 1.092914267s), (UpdatePipelineStages, 27.59µs)]
2024-06-24T17:22:55.512735Z DEBUG storage::db::mdbx: Commit total_duration=634.621222ms commit_latency=Some(CommitLatency(MDBX_commit_latency { preparation: 0, gc_wallclock: 333, audit: 0, write: 0, sync: 41236, ending: 11, whole: 41582, gc_cputime: 267, gc_prof: MDBX_commit_latency__bindgen_ty_1 { wloops: 0, coalescences: 0, wipes: 0, flushes: 0, kicks: 0, work_counter: 0, work_rtime_monotonic: 0, work_xtime_cpu: 0, work_rsteps: 0, work_xpages: 0, work_majflt: 0, self_counter: 0, self_rtime_monotonic: 0, self_xtime_cpu: 0, self_rsteps: 0, self_xpages: 0, self_majflt: 0 } })) is_read_only=false
2024-06-24T17:22:55.512783Z DEBUG blockchain_tree: Canonicalization finished actions=[(CloneOldBlocks, 35.8µs), (FindCanonicalHeader, 8.689µs), (SplitChain, 10.150111ms), (SplitChainForks, 4.709µs), (MergeAllChains, 2.311µs), (UpdateCanonicalIndex, 16.849µs), (RetrieveStateTrieUpdates, 478.614714334s), (CommitCanonicalChainToDatabase, 128.23875028s)]
2024-06-24T17:22:55.512860Z DEBUG consensus::engine: Canonicalized new head hash=0x1b7df91499c8cff458101e777ae5dfb8f235cb4ddf9c9139bdfc4936c9ca79f5 number=20162881
2024-06-24T17:22:55.513081Z  INFO reth_node_events::node: Canonical chain committed number=20162881 hash=0x1b7df91499c8cff458101e777ae5dfb8f235cb4ddf9c9139bdfc4936c9ca79f5 elapsed=606.863765983s
2024-06-24T17:22:55.667003Z DEBUG blockchain_tree: Block hash not found in block indices block_hash=0xdba3136e75c00327a1a01cf994484f4db89da875f4e5e11e0a43ca8e1060ae61
2024-06-24T17:22:55.667119Z DEBUG consensus::engine: Syncing to new target target=0xdba3136e75c00327a1a01cf994484f4db89da875f4e5e11e0a43ca8e1060ae61
2024-06-24T17:22:55.667210Z  INFO reth_node_events::node: Live sync in progress, downloading blocks remaining_blocks=1 target_block_hash=0xdba3136e75c00327a1a01cf994484f4db89da875f4e5e11e0a43ca8e1060ae61
2024-06-24T17:22:55.667225Z  INFO reth_node_events::node: Received forkchoice updated message when syncing head_block_hash=0xdba3136e75c00327a1a01cf994484f4db89da875f4e5e11e0a43ca8e1060ae61 safe_block_hash=0x3f46f5aa38c67e56dabec9d2c7d4b71b359daff84f052b2785868b48c2e47142 finalized_block_hash=0x4ba6cbd0726ed7ecfa87ddaa9e3e99dc458c3a4c42145ec004b5f4a7c80672a6
2024-06-24T17:22:55.669605Z DEBUG blockchain_tree: Block hash not found in block indices block_hash=0x54f70a6f37f35373a764376e14385a1768f8bca630c82a4471267dcfe6a9f4d4
2024-06-24T17:22:55.669639Z DEBUG consensus::engine: Syncing to new target target=0x54f70a6f37f35373a764376e14385a1768f8bca630c82a4471267dcfe6a9f4d4
2024-06-24T17:22:55.669696Z  INFO reth_node_events::node: Live sync in progress, downloading blocks remaining_blocks=1 target_block_hash=0x54f70a6f37f35373a764376e14385a1768f8bca630c82a4471267dcfe6a9f4d4
2024-06-24T17:22:55.671253Z DEBUG blockchain_tree: Block hash not found in block indices block_hash=0xb3e62ae9701947ccc25d970f6d17388ad3e8c9520534581b43c751ff2b595c29
2024-06-24T17:22:55.671310Z DEBUG consensus::engine: Syncing to new target target=0xb3e62ae9701947ccc25d970f6d17388ad3e8c9520534581b43c751ff2b595c29
2024-06-24T17:22:55.671361Z  INFO reth_node_events::node: Live sync in progress, downloading blocks remaining_blocks=1 target_block_hash=0xb3e62ae9701947ccc25d970f6d17388ad3e8c9520534581b43c751ff2b595c29
2024-06-24T17:22:55.672474Z DEBUG blockchain_tree: Block hash not found in block indices block_hash=0xb729894cc7dd5f0052aab6a11393ba224129a580fd2e3a7a64962caccfd8f49f
2024-06-24T17:22:55.672523Z DEBUG consensus::engine: Syncing to new target target=0xb729894cc7dd5f0052aab6a11393ba224129a580fd2e3a7a64962caccfd8f49f
2024-06-24T17:22:55.672577Z  INFO reth_node_events::node: Live sync in progress, downloading blocks remaining_blocks=1 target_block_hash=0xb729894cc7dd5f0052aab6a11393ba224129a580fd2e3a7a64962caccfd8f49f
2024-06-24T17:22:55.672589Z  INFO reth_node_events::node: Received forkchoice updated message when syncing head_block_hash=0xb729894cc7dd5f0052aab6a11393ba224129a580fd2e3a7a64962caccfd8f49f safe_block_hash=0x54f70a6f37f35373a764376e14385a1768f8bca630c82a4471267dcfe6a9f4d4 finalized_block_hash=0x3f46f5aa38c67e56dabec9d2c7d4b71b359daff84f052b2785868b48c2e47142
2024-06-24T17:22:55.744368Z DEBUG blockchain_tree: Block hash not found in block indices block_hash=0xdfdf61f602746f4131f448593665efba2b13de8a2958e7c8790bd32ba72f409b
2024-06-24T17:22:55.744456Z DEBUG consensus::engine: Syncing to new target target=0xb729894cc7dd5f0052aab6a11393ba224129a580fd2e3a7a64962caccfd8f49f
2024-06-24T17:22:55.746102Z DEBUG blockchain_tree: Block hash not found in block indices block_hash=0xef722823d138420d0b74a3501138148771f12f09435f59fc6339294dc18f20f7
2024-06-24T17:22:55.746160Z DEBUG consensus::engine: Syncing to new target target=0xef722823d138420d0b74a3501138148771f12f09435f59fc6339294dc18f20f7
2024-06-24T17:22:55.746225Z  INFO reth_node_events::node: Live sync in progress, downloading blocks remaining_blocks=1 target_block_hash=0xef722823d138420d0b74a3501138148771f12f09435f59fc6339294dc18f20f7
2024-06-24T17:22:55.748002Z DEBUG blockchain_tree: Block hash not found in block indices block_hash=0xcf55b9c72dc2a355e98a32cc2baf603cff7e9db82984ef5cffcfa1e1f086336b
2024-06-24T17:22:55.748071Z DEBUG consensus::engine: Syncing to new target target=0xcf55b9c72dc2a355e98a32cc2baf603cff7e9db82984ef5cffcfa1e1f086336b
2024-06-24T17:22:55.748091Z  INFO reth_node_events::node: Live sync in progress, downloading blocks remaining_blocks=1 target_block_hash=0xcf55b9c72dc2a355e98a32cc2baf603cff7e9db82984ef5cffcfa1e1f086336b
2024-06-24T17:22:55.748101Z  INFO reth_node_events::node: Received forkchoice updated message when syncing head_block_hash=0xcf55b9c72dc2a355e98a32cc2baf603cff7e9db82984ef5cffcfa1e1f086336b safe_block_hash=0xdfdf61f602746f4131f448593665efba2b13de8a2958e7c8790bd32ba72f409b finalized_block_hash=0x54f70a6f37f35373a764376e14385a1768f8bca630c82a4471267dcfe6a9f4d4
2024-06-24T17:22:55.749329Z DEBUG blockchain_tree: Block hash not found in block indices block_hash=0xb741c5644311d5f9c2335716010febdd7ee2363290aa482561d2942cf7dd26a3
2024-06-24T17:22:55.749376Z DEBUG consensus::engine: Syncing to new target target=0xb741c5644311d5f9c2335716010febdd7ee2363290aa482561d2942cf7dd26a3
2024-06-24T17:22:55.749424Z  INFO reth_node_events::node: Live sync in progress, downloading blocks remaining_blocks=1 target_block_hash=0xb741c5644311d5f9c2335716010febdd7ee2363290aa482561d2942cf7dd26a3
2024-06-24T17:22:55.754914Z DEBUG blockchain_tree: Block hash not found in block indices block_hash=0xab742e0ff0cc4754a3687307b6e6334fdf7aa0c59428163db6929ff85a15162b
2024-06-24T17:22:55.754945Z DEBUG consensus::engine: Syncing to new target target=0xab742e0ff0cc4754a3687307b6e6334fdf7aa0c59428163db6929ff85a15162b
2024-06-24T17:22:55.754965Z  INFO reth_node_events::node: Live sync in progress, downloading blocks remaining_blocks=1 target_block_hash=0xab742e0ff0cc4754a3687307b6e6334fdf7aa0c59428163db6929ff85a15162b
2024-06-24T17:22:55.754973Z  INFO reth_node_events::node: Received forkchoice updated message when syncing head_block_hash=0xab742e0ff0cc4754a3687307b6e6334fdf7aa0c59428163db6929ff85a15162b safe_block_hash=0xfad453ea27b32c2a7eff50213128e270764986ebfb643e58b57c1c7accb0f291 finalized_block_hash=0xdfdf61f602746f4131f448593665efba2b13de8a2958e7c8790bd32ba72f409b
2024-06-24T17:22:55.821130Z DEBUG consensus::engine::hooks: Next hook is not ready hook="StaticFile"
2024-06-24T17:22:55.960010Z DEBUG static_file: StaticFileProducer started targets=StaticFileTargets { headers: Some(20162806..=20162881), receipts: Some(20162806..=20162881), transactions: Some(20162806..=20162881) }
2024-06-24T17:22:55.960070Z  INFO reth_node_events::node: Static File Producer started targets=StaticFileTargets { headers: Some(20162806..=20162881), receipts: Some(20162806..=20162881), transactions: Some(20162806..=20162881) }
2024-06-24T17:22:55.960138Z DEBUG static_file: StaticFileProducer segment segment=Transactions block_range=20162806..=20162881
2024-06-24T17:22:55.960182Z DEBUG static_file: StaticFileProducer segment segment=Headers block_range=20162806..=20162881
2024-06-24T17:22:55.960204Z DEBUG static_file: StaticFileProducer segment segment=Receipts block_range=20162806..=20162881
2024-06-24T17:22:55.971089Z DEBUG static_file: Finished StaticFileProducer segment segment=Headers block_range=20162806..=20162881 elapsed=10.88491ms
2024-06-24T17:22:56.343031Z DEBUG static_file: Finished StaticFileProducer segment segment=Receipts block_range=20162806..=20162881 elapsed=382.813345ms
2024-06-24T17:22:56.479520Z DEBUG static_file: Finished StaticFileProducer segment segment=Transactions block_range=20162806..=20162881 elapsed=519.366124ms
2024-06-24T17:22:56.491351Z DEBUG provider::static_file: Commit segment=Transactions path="/data/static_files/static_file_transactions_20000000_20499999" duration=11.76105ms
2024-06-24T17:22:56.498037Z DEBUG provider::static_file: Commit segment=Headers path="/data/static_files/static_file_headers_20000000_20499999" duration=5.397ms
2024-06-24T17:22:56.507985Z DEBUG provider::static_file: Commit segment=Receipts path="/data/static_files/static_file_receipts_20000000_20499999" duration=8.94396ms
2024-06-24T17:22:56.509191Z DEBUG static_file: StaticFileProducer finished targets=StaticFileTargets { headers: Some(20162806..=20162881), receipts: Some(20162806..=20162881), transactions: Some(20162806..=20162881) } elapsed=549.131634ms
2024-06-24T17:22:56.509269Z DEBUG sync::stages::headers: Commencing sync tip=Hash(0xdfdf61f602746f4131f448593665efba2b13de8a2958e7c8790bd32ba72f409b) head=0x1b7df91499c8cff458101e777ae5dfb8f235cb4ddf9c9139bdfc4936c9ca79f5
2024-06-24T17:22:56.651078Z  INFO reth_node_events::node: Preparing stage pipeline_stages=1/12 stage=Headers checkpoint=20162881 target=None
2024-06-24T17:22:56.651104Z  INFO reth_node_events::node: Static File Producer finished targets=StaticFileTargets { headers: Some(20162806..=20162881), receipts: Some(20162806..=20162881), transactions: Some(20162806..=20162881) } elapsed=549.131634ms
2024-06-24T17:22:56.953400Z DEBUG sync::stages::headers: Commencing sync tip=Hash(0xdfdf61f602746f4131f448593665efba2b13de8a2958e7c8790bd32ba72f409b) head=0x1b7df91499c8cff458101e777ae5dfb8f235cb4ddf9c9139bdfc4936c9ca79f5
2024-06-24T17:22:56.953447Z  INFO sync::stages::headers: Received headers total=51 from_block=20162932 to_block=20162882

when it decided to start stage syncing

Rjected commented 3 months ago

What kind of infrastructure is this running on? There seem to be potentially multiple things going on here. One thing that seems a bit off is the following logs:

The database read transaction has been open for too long
...
Validated state root number=20162882 hash=0x3b364866f0822159f47d32da72f64f6b8f58e5f4c1a9ac77de8ab0426fad6bc5 elapsed=320.897104155s

as well as

Validated state root number=20162881 hash=0x1b7df91499c8cff458101e777ae5dfb8f235cb4ddf9c9139bdfc4936c9ca79f5 elapsed=25.327332788s
CommitCanonicalChainToDatabase, 83.249954225s

These should take less than a second, even on sub optimal hardware, what kind of disk are you running on? If it's a slow disk, these numbers make sense, and the fact that execution takes an extremely long amount of time.

Along with the disk model / information, could you also run the iozone benchmarks listed here: https://github.com/paradigmxyz/reth/issues/6859

argakiig commented 3 months ago

This is running on google kubernetes engine using a pd-ssd type disk for the persistent storage, which has worked fine in the other clusters. I can setup a test in a bit.

Shortly after my post the other day I saw that v1.0.0 was available and started to upgrade the kubernetes deployments to use the new image and it is working rather happily now as well

argakiig commented 3 months ago

ok, I so I had a reason to restart one of my nodes for maintenance, and now 2 hours later it has remained stuck at

2024-07-02T04:51:34.529942Z  INFO Received forkchoice updated message when syncing head_block_hash=0xecd25ca6312b7244aaa3a343b2412ff8a8473a027ab29fa249d0c2b10b3929ed safe_block_hash=0x855a4b156a368db76b730163c92b8351ba0b8cea499ce46bf972032980fa74f3 finalized_block_hash=0x0ead41cd79cfea83514c5a06232d34c20f04ad8fa9448153c0e1c9f83fdd8b7e
2024-07-02T04:51:39.434032Z  INFO Status connected_peers=39 freelist=228761 stage=Execution checkpoint=20214316 target=20214430
2024-07-02T04:52:04.434810Z  INFO Status connected_peers=39 freelist=228761 stage=Execution checkpoint=20214316 target=20214430
2024-07-02T04:52:29.433754Z  INFO Status connected_peers=43 freelist=228761 stage=Execution checkpoint=20214316 target=20214430

I am still receiving fork choice updates so i know lighthouse and reth are communicating as expected.

Debug logs show the usual suspects as well when filtering out logs with net:

2024-07-02T04:39:48.504245Z DEBUG consensus::engine::hooks: Next hook is not ready hook="StaticFile"
2024-07-02T04:39:59.434305Z  INFO reth::cli: Status connected_peers=18 freelist=228761 stage=Execution checkpoint=20214316 target=20214430
2024-07-02T04:40:00.624904Z DEBUG consensus::engine::hooks: Next hook is not ready hook="StaticFile"
2024-07-02T04:40:12.521497Z DEBUG consensus::engine::hooks: Next hook is not ready hook="StaticFile"
2024-07-02T04:40:24.433993Z  INFO reth::cli: Status connected_peers=17 freelist=228761 stage=Execution checkpoint=20214316 target=20214430
2024-07-02T04:40:24.681469Z DEBUG consensus::engine::hooks: Next hook is not ready hook="StaticFile"
2024-07-02T04:40:36.961583Z DEBUG consensus::engine::hooks: Next hook is not ready hook="StaticFile"
2024-07-02T04:40:48.430402Z DEBUG consensus::engine::hooks: Next hook is not ready hook="StaticFile"
2024-07-02T04:40:49.434645Z  INFO reth::cli: Status connected_peers=17 freelist=228761 stage=Execution checkpoint=20214316 target=20214430
2024-07-02T04:41:00.744048Z DEBUG consensus::engine::hooks: Next hook is not ready hook="StaticFile"
2024-07-02T04:41:14.361220Z DEBUG consensus::engine::hooks: Next hook is not ready hook="StaticFile"
2024-07-02T04:41:14.434164Z  INFO reth::cli: Status connected_peers=20 freelist=228761 stage=Execution checkpoint=20214316 target=20214430
2024-07-02T04:41:17.234412Z DEBUG consensus::engine::hooks: Next hook is not ready hook="StaticFile"
2024-07-02T04:41:25.334209Z DEBUG consensus::engine::hooks: Next hook is not ready hook="StaticFile"

Looking at how to run the requested benchmarks as this is running as a kubernetes workload in the meantime though here is the last day of the disk usage image

you can see every time i have restarted the container as it looks like nothing is going on, I hadnt checked disk usage before but quite right, nothing is going on

argakiig commented 3 months ago

Quick update. Had to go back to beta 6 and was able to start syncing again, staying on 1.0.0 was yielding the same non-existent disk usage

argakiig commented 3 months ago

What kind of infrastructure is this running on? There seem to be potentially multiple things going on here. One thing that seems a bit off is the following logs:

The database read transaction has been open for too long
...
Validated state root number=20162882 hash=0x3b364866f0822159f47d32da72f64f6b8f58e5f4c1a9ac77de8ab0426fad6bc5 elapsed=320.897104155s

as well as

Validated state root number=20162881 hash=0x1b7df91499c8cff458101e777ae5dfb8f235cb4ddf9c9139bdfc4936c9ca79f5 elapsed=25.327332788s
CommitCanonicalChainToDatabase, 83.249954225s

These should take less than a second, even on sub optimal hardware, what kind of disk are you running on? If it's a slow disk, these numbers make sense, and the fact that execution takes an extremely long amount of time.

Along with the disk model / information, could you also run the iozone benchmarks listed here: #6859

This is running on a managed GKE instance. running in a single zone The reth and lighthouse nodes are running on a t2d-standard-32. disk is network mounted pvc backed by a pd-ssd volume

I had one node that I kept at beta 6 and it never lost sync, hasnt displayed any of the fallbehind/catchup behavior that the nodes I upgraded to v1.0.0

As to what disks, not really sure what google uses. https://cloud.google.com/compute/docs/disks/performance#performance_limits gives a bit of an indication about bandwidth available. but since I was getting 0 disk activity other than a quick spike as the node would first come online until reverting to beta 6 leads me to believe disk is not the issue.

github-actions[bot] commented 2 months ago

This issue is stale because it has been open for 21 days with no activity.

github-actions[bot] commented 2 months ago

This issue was closed because it has been inactive for 7 days since being marked as stale.