prysmaticlabs / prysm

Go implementation of Ethereum proof of stake
https://www.offchainlabs.com
GNU General Public License v3.0
3.46k stars 985 forks source link

Allow Beacon DB Pruning #8787

Open nisdas opened 3 years ago

nisdas commented 3 years ago

๐Ÿš€ Feature Request

Description

With the chain reaching 1,000,000 slots soon, we would need to start looking into ways to reduce the overall db size of the beacon node. While we do not save often(Once every 2048 slots), with 1 million slots, the number of states now compromises a significant size of the total database. For normal beacon operations only the current finalized state is required, any historical state beyond that can be pruned away with no effect on current beacon operations.

Describe the solution you'd like

There are multiple ways to approach this:

This allows the beacon node to only keep up to N states saved in the database beyond the current finalized checkpoint. ( excluding genesis state). During each migration to the cold state we can also prune away historical states that are uneeded. This can effectively keep database size at a constant size( ignoring new blocks), instead of the linear growth in size currently seen. The downside of this is that there would need to be an initial state 'cleanup' before we can make it a background routine. As the initial state deletion would be very big(~ 10 GB compressed), this might end up taking some time and memory.

This makes DB pruning opt-in rather than the default, ensuring that users arent impacted during normal operations. However the main advantage is also its downside as any user will have to deal with some downtime while pruning the db.

One important thing to note of is that, once the unneeded states are deleted, the freed up disk space will be put into a freelist. Due to how bolt is structured, any value deleted from the bucket is placed in its freelist. However there is no way to reclaim this from disk and it each subsequent write after will involve syncing the freelist also. This leads to a noticeable negative impact on DB writes. We could potentially get around this by not syncing the freelist on each write, however this can lead to very long recovery times on a node restart( if freelist sync is re-enabled).

Describe alternatives you've considered

Instead of storing large states, we instead store the field differences between the current state, and a previously stored root state. This would significantly reduce the size for each new state saved at the cost of increasing implementation complexity significantly.

mrabino1 commented 3 years ago

From my lens, the default should be that pruning happens automatically every X slots (maybe every 6 hours or 24 hours) unless explicitly instructed not to via CLI (e.g. --no_db_pruning ). While not expected, one can also imagine the other subsequent scenarios, 1.) the user in the future removes that flag which kicks off the pruning... OR 2.) the user after pruning, decides to add that flag..

Either way, I agree that the sooner we start this this routine clean-up the easier it will be in the future. I would recommend a similar UX as geth's most recent clean-up implementation that started and stopped in between blocks to not disrupt uptime. The overall process took longer but with no outage.

mohamedmansour commented 5 months ago

I was checking the size of beacon data and it has reached 600GB

$ du -h -d 1
1.3G    ./slasherdata
22G     ./blobs
584G    ./beaconchaindata
979G    ./geth

In favor of the first approach for background pruning. But the good thing about the second approach, if people can archive the current beacon db and then manually prune, if that was possible. But then once they backup the beacon db, could they query it?