Closed HaoranYi closed 1 year ago
What's the additional size though?
Good question. @jeffwashington @brooksprumo Do we have any metrics for the AccountDB index size?
mnb index on disk is currently 48G du -shc ledger/accounts_index An issue we may encounter is the desired randomness in hashing keys per validator. In the disk bucket case, the randomness is per data and index buckets and per 'bin'. Currently we use 8192 bins. Just fyi.
# bins is a cli arg. Will likely need to increase as # accounts increase.
We could also just persist the index for local snapshots, as a separate artifact.
Rebuilding the index on boot is a major contributor to node startup time so any reductions will be welcome by all.
I'm happy to brainstorm about this. And I'd love faster startup time. There will be issues. We have decoupled snapshot generation and hash calculation from the accounts index. This means that the index could be out of date relative to the append vecs in a snapshot. And, copying the index off while it is actively in use as well as maintaining 'old' entries that exist in snapshot append vecs while new roots are continuously being made and 'old' duplicates are cleaned introduces some race conditions, perf issues, locking issues, and state management that I'm confident we don't accurately handle right now. We also don't persist the in-mem portion of the disk buckets or the items that would be in-mem for perf reasons (such as lru). This feels like a can of worms. Maybe I'm being too pessimistic. Maybe we can come up with a brilliant insight.
hmm, I see. yes seems like we’d need a new method to snapshot the index as well.
We have done work to increase boot time by implementing what we called boot-snapshot
, which keeps the previous snapshot, and upon reboot, continues exactly where it left off when shutdown (avoiding the need to redownload a snapshot, untar it, etc..).
The code is here: https://github.com/solana-labs/solana/compare/832cb76e45d4b43f15a95bdd25a60e3113c16bdc...streamingfast:fast-boot?expand=1
I'd love for this to be merged in some ways, and made to work. It worked for the most part, but there were some stability issues.
I can jump on a call any time to give an overview of the fast-boot
design. Find me on the StreamingFast Discord, all our team is there, linked from streamingfast.io
@abourget Nice work and good idea.
We have implemented a similar feature in the validator. The idea is similar to yours. https://github.com/solana-labs/solana/issues/23452
If you pass no-snapshot-fetch
on the cli, and you have the the snapshot file locally, validator will skip cleaning and shrinking at startup. This will cut about half of the start up time.
Compare with your approach, the above one still have the cost of untaring, which I hope this work - https://github.com/solana-labs/solana/issues/24798 will reduce untaring time by a factor of 10. If we can achieve that, the total cost of untaring will be around 20s. But we will save the disk usage and avoid the extra store for the boot-snapshot.
Problem
When a validator starts, it spends a signifcant time to rebuild the AccountDB index. Rather than rebuiding the index from the loaded snapshot at startup, is it possible that we save the AccountDB index in addtion to the snapshot?
Proposed Solution
Save AccountDB index in addition to AccountDB in the snapshot file.