History pruning takes a very long time

paradigmxyz / reth

Modular, contributor-friendly and blazing-fast implementation of the Ethereum protocol, in Rust

https://reth.rs/

Apache License 2.0

3.52k stars 906 forks source link

History pruning takes a very long time #8995

Open shekhirin opened 2 weeks ago

shekhirin commented 2 weeks ago

Problem

Sender Recovery pruning goes before Account History, and we don't start pruning the next segment until the previous one is completed. Once Sender Recovery is finished (the chart on the right, you can see that it stops taking a significant time and completes instantly), pruner starts to call Account History pruning (chart on the right, spike in time)

The problem is that while we prune Sender Recovery (but do not prune Account History), Account History tables accumulate data and it starts to take more time to prune them after Sender Recovery is done.

Solution

Fix history pruning, so that we don't need to walk all indices every pruner run
Re-order segments, so that Account and Storage History segments go before Sender Recovery
https://github.com/paradigmxyz/reth/issues/7343
Since freelist isn't an issue anymore, on node startup prune what's left unpruned in the database (opt-out via a flag)

deromik commented 1 week ago

faced with the same on on 06/21

and exactly at that time, lost 4 nodes due to root state mismatch on rc2 and had to resync them at 06/21 and lost 2 more nodes on rc2 with the same reason today, resyncing suppose the pruning time is the root cause for this

the logs are full of 'Hook is in progress, delaying forkchoice update. This may affect the performance of your node as a validator.' while the pruner is in progress

nodes are 980pro/990pro, ryzen 7900x/7950x

lodotek commented 3 days ago

Is there any workaround for now? I brought up a new node and it seems that it is taking significantly longer to get fully syned and ready to perform, than every other EL client I've used so far. Should I just be patient and let it finish, or is there some setting I should change or something?

Rjected commented 3 days ago

Is there any workaround for now? I brought up a new node and it seems that it is taking significantly longer to get fully syned and ready to perform, than every other EL client I've used so far. Should I just be patient and let it finish, or is there some setting I should change or something?

Is this referring to initial sync time, or time to prune?

lodotek commented 3 days ago

Is there any workaround for now? I brought up a new node and it seems that it is taking significantly longer to get fully syned and ready to perform, than every other EL client I've used so far. Should I just be patient and let it finish, or is there some setting I should change or something?

Is this referring to initial sync time, or time to prune?

Sorry - initial sync time. I mean it took a very long time to sync (seemingly longer than geth and NM). And once it synced, it seems to have automatically started pruning, which seems to be going extremely slow too ¯_(ツ)_/¯

Rjected commented 3 days ago

Sorry - initial sync time. I mean it took a very long time to sync (seemingly longer than geth and NM)

This is because reth does not have a "snapshot sync" method, and executes the entire history by default, whereas geth and NM do snap sync by default iirc. The pruning behavior is more relevant here though