Open DaveWK opened 13 hours ago
for the record, without --engine.experimental
the performance is even worse
very odd, same as https://github.com/paradigmxyz/reth/issues/11306
we haven't tried to reproduce this from the snapshot yet, but resynced base entirely on similar infrastructure as yours without any issues. I wonder if this has anything to do with the most recent snapshot itself, will check.
resyncing base archive takes ~48hrs, so currently I'd recommend this
Describe the bug
I have attempted on a few different setups, but it does not appear I am able to sync an archival node (without --full) and keep it in sync on AWS. I am using a io2 storage (20k iops) with an r7a.2xlarge (64 gigs of ram, 8 AMD EPYC 9R14 cores) and it seems to keep looping through the pipeline stages but never catching up.. It seems like the culprit is MerkelExecute, and I cans ee from the performance that it is not a CPU-bound problem; the single core (since I assume this is a serlialized, single-thread step) is not maxed out, however my disk iops and utilization is always at 100%.. Also the amount of data being transferred is pretty small, so even with 20k iops I am only read/writing around 8 megs of data.
My suspicion is the mdbx file is too "sparse" and it needs some kind of online compaction or "defrag" but don't know how to debug this. Running mdbx_copy is not really a solution since it takes 5 hours to run (and is not an online operation) and I am not able to sync from the available reth-base archive snapshot.
Steps to reproduce
exec op-reth node --chain=base \ --rollup.sequencer-http https://mainnet-sequencer.base.org \ --http --http.port 8545 --ws --ws.port 8546 \ --http.api=web3,debug,eth,net,txpool \ --ws.api=web3,debug,eth,net,txpool \ --metrics=127.0.0.1:9001 \ --ws.origins="*" \ --http.corsdomain="*" \ --rollup.discovery.v4 \ --engine.experimental \ --authrpc.jwtsecret ${HOME}/jwt.hex \ --datadir ${HOME}/oprethdata
Node logs
No response
Platform(s)
Linux (x86)
What version/commit are you on?
v1.0.8
What database version are you on?
2
Which chain / network are you on?
base mainnet
What type of node are you running?
Archive (default)
What prune config do you use, if any?
n/a
If you've built Reth from source, provide the full command you used
make maxperf-op
Code of Conduct