(op-reth) Large memory usage / OOM during MerkleExecute Stage

zilayo commented 2 days ago

Describe the bug

There's been several reports in telegram regarding OOM issues during MerkleExecute when syncing base mainnet. Opening this issue to provide additional context. (I didn't OOM but observed a large memory spike that would OOM other machines)

This happens between the executing stage and committing stage (normally takes 1s or less), which lasts the same amount of time (5 mins or so) that the memory spike happens.

The save_execution_checkpoint function isn't ran / "Saving inner merkle checkpoint" message isn't logged until the end of these 5 mins. https://github.com/paradigmxyz/reth/blob/c326708ffc14f1dae63419521884b0a90b3e037d/crates/stages/stages/src/stages/merkle.rs#L115-L131

"Continuing inner merkle checkpoint" is logged so we're in this branch https://github.com/paradigmxyz/reth/blob/c326708ffc14f1dae63419521884b0a90b3e037d/crates/stages/stages/src/stages/merkle.rs#L176-L189

Looks like the root cause is somewhere in this code: https://github.com/paradigmxyz/reth/blob/c326708ffc14f1dae63419521884b0a90b3e037d/crates/stages/stages/src/stages/merkle.rs#L213-L231

Either root_with_progress() -> calculate() or write_trie_updates look like the main culprits. Both look heavy on allocations.

It's most likely StateRoot::root_with_progress() -> StateRoot::calculate() as this records the TrieRootMetrics at the end of the function, which we can see in the screenshots below are recorded when the memory spike ends.

Memory grows between 04:40 - 04:46

Huge amount of storage leaves are added once this completed (313,998,906 compared to what was previously 0-10m leaves)

DB also grows 5GB once this is committed

DB Metrics:

Merkle execute stage progress jumps from 52% - 76%

Steps to reproduce

Sync an archive op-reth node on Base Mainnet.

No special config/flags used to run it - just using:

[stages.merkle]
clean_threshold = 5000

I unwound a node to block 21710931 and began syncing from there. A few hours later is when the memory spike happened, so furthest I can narrow it down to is somewhere between base mainnet blocks 21710931 - 22328826. Should be able to replicate fairly quickly by getting a db that's close to block 21710931 and then let it sync from there.

The same memory spike occurred during MerkleExecute when I was syncing from genesis, but I didn't look into the logs much.

Node logs

Memory spike + stage hang begins at log entry dated 2024-11-13T04:40:49.880967Z ("Executing stage") and lasts until "Committed stage progress" at 2024-11-13T04:46:31.943855Z

Note the long delay between "Executing stage" and "Committed stage progress", which normally only takes 1 second. Once this finally commits, preparing -> executing -> committed returns to only taking around a second.

Logs from 30 seconds prior until 1 minute after are attached at the top of the issue. Please note - i removed the engine + net logs to make it easier to read. During this time there were still new engine messages being logged.

Platform(s)

Linux (x86)

What version/commit are you on?

reth-optimism-cli Version: 1.1.1 Commit SHA: 15c230bac20e2b1b3532c8b0d470e815fbc0cc22 Build Timestamp: 2024-11-05T13:12:49.909002171Z Build Features: asm_keccak,jemalloc,optimism Build Profile: maxperf

What database version are you on?

Current database version: 2 Local database version: 2

Which chain / network are you on?

Base mainnet

What type of node are you running?

Archive (default)

What prune config do you use, if any?

[prune] block_interval = 999999999

If you've built Reth from source, provide the full command you used

n/a

Code of Conduct

[x] I agree to follow the Code of Conduct

zilayo commented 2 days ago

The same actually happened later as well from 06:55 - 07:04. 313,936,113 storage leaves added this time (313,998,906 during the first memory spike above). This is on the second pipeline run after the first pipeline was finished.

{"timestamp":"2024-11-13T06:55:28.084998Z","level":"INFO","fields":{"message":"Preparing stage","pipeline_stages":"8/12","stage":"MerkleExecute","checkpoint":"22328826","target":"22344678"},"target":"reth_node_events::node"}
{"timestamp":"2024-11-13T06:55:28.085011Z","level":"INFO","fields":{"message":"Executing stage","pipeline_stages":"8/12","stage":"MerkleExecute","checkpoint":"22328826","target":"22344678","stage_eta":"23m 12s"},"target":"reth_node_events::node"}
{"timestamp":"2024-11-13T06:55:28.085013Z","level":"DEBUG","fields":{"message":"Continuing inner merkle checkpoint","current":"22328826","target":"22344678","last_account_key":"0xbe61cbebdaec5b008fc3d1b0b63a7e0577f531e96ebcc1a494e2b70499d04f33"},"target":"sync::stages::merkle::exec","span":{"name":"ChainOrchestrator::poll"},"spans":[{"name":"ChainOrchestrator::poll"}]}

... no more reth_node_events / sync::stages logs during this period ...

{"timestamp":"2024-11-13T07:04:42.778084Z","level":"INFO","fields":{"message":"Committed stage progress","pipeline_stages":"8/12","stage":"MerkleExecute","checkpoint":"22328826","target":"22344678","stage_progress":"76.60%","stage_eta":"8m 58s"},"target":"reth_node_events::node"}
{"timestamp":"2024-11-13T07:04:43.811289Z","level":"DEBUG","fields":{"message":"Event: Handler(ForkchoiceUpdated(ForkchoiceState { head_block_hash: 0xa2c7f361615d286c1f7bf542953c944f934c34f3ebb9de056794dfbcb5f88d29, safe_block_hash: 0x0000000000000000000000000000000000000000000000000000000000000000, finalized_block_hash: 0x0000000000000000000000000000000000000000000000000000000000000000 }, Syncing))"},"target":"reth::cli"}
{"timestamp":"2024-11-13T07:04:43.813734Z","level":"DEBUG","fields":{"message":"Commit","total_duration":"1.035686689s","commit_latency":"Some(CommitLatency(MDBX_commit_latency { preparation: 0, gc_wallclock: 57, audit: 0, write: 0, sync: 67814, ending: 0, whole: 67873, gc_cputime: 0, gc_prof: MDBX_commit_latency__bindgen_ty_1 { wloops: 0, coalescences: 0, wipes: 0, flushes: 0, kicks: 0, work_counter: 0, work_rtime_monotonic: 0, work_xtime_cpu: 0, work_rsteps: 0, work_xpages: 0, work_majflt: 0, self_counter: 0, self_rtime_monotonic: 0, self_xtime_cpu: 0, self_rsteps: 0, self_xpages: 0, self_majflt: 0 } }))","is_read_only":false},"target":"storage::db::mdbx","span":{"name":"ChainOrchestrator::poll"},"spans":[{"name":"ChainOrchestrator::poll"}]}
{"timestamp":"2024-11-13T07:04:43.813816Z","level":"INFO","fields":{"message":"Preparing stage","pipeline_stages":"8/12","stage":"MerkleExecute","checkpoint":"22328826","target":"22344678"},"target":"reth_node_events::node"}
{"timestamp":"2024-11-13T07:04:43.813825Z","level":"DEBUG","fields":{"message":"Continuing inner merkle checkpoint","current":"22328826","target":"22344678","last_account_key":"0xbe67e4c7cd9e0722c7a175135f72c02b5ec1ddba0f73dacde8e007defbe9fca9"},"target":"sync::stages::merkle::exec","span":{"name":"ChainOrchestrator::poll"},"spans":[{"name":"ChainOrchestrator::poll"}]}
{"timestamp":"2024-11-13T07:04:43.813835Z","level":"INFO","fields":{"message":"Executing stage","pipeline_stages":"8/12","stage":"MerkleExecute","checkpoint":"22328826","target":"22344678","stage_eta":"8m 57s"},"target":"reth_node_events::node"}
{"timestamp":"2024-11-13T07:04:44.537942Z","level":"DEBUG","fields":{"message":"Saving inner merkle checkpoint","last_account_key":"0xbe863bb046edd46c453217e574cb150b7321bfe8ccf480c24220e13d4593a6b8"},"target":"sync::stages::merkle::exec","span":{"name":"ChainOrchestrator::poll"},"spans":[{"name":"ChainOrchestrator::poll"}]}
{"timestamp":"2024-11-13T07:04:44.593817Z","level":"INFO","fields":{"message":"Committed stage progress","pipeline_stages":"8/12","stage":"MerkleExecute","checkpoint":"22328826","target":"22344678","stage_progress":"76.62%","stage_eta":"34m 48s"},"target":"reth_node_events::node"}

preparing -> committing -> executing again returns to ~1s afterwards

Ethics3606 commented 2 days ago

Hi,

I had the same issue, but resolved it by dropping the merkle table and letting it re-run in execution-sync. It may have to do with a recent bugfix in the trie calculations, where there is a conflict of new and old data in this table. It may also be related to SSD speeds

paradigmxyz / reth