solana-labs / solana

Web-Scale Blockchain for fast, secure, scalable, decentralized apps and marketplaces.
https://solanalabs.com
Apache License 2.0
13.21k stars 4.29k forks source link

Persist AccountDB Index along with snapshot #24643

Closed HaoranYi closed 1 year ago

HaoranYi commented 2 years ago

Problem

When a validator starts, it spends a signifcant time to rebuild the AccountDB index. Rather than rebuiding the index from the loaded snapshot at startup, is it possible that we save the AccountDB index in addtion to the snapshot?

Proposed Solution

Save AccountDB index in addition to AccountDB in the snapshot file.

sakridge commented 2 years ago

What's the additional size though?

HaoranYi commented 2 years ago

Good question. @jeffwashington @brooksprumo Do we have any metrics for the AccountDB index size?

jeffwashington commented 2 years ago

mnb index on disk is currently 48G du -shc ledger/accounts_index An issue we may encounter is the desired randomness in hashing keys per validator. In the disk bucket case, the randomness is per data and index buckets and per 'bin'. Currently we use 8192 bins. Just fyi.

jeffwashington commented 2 years ago

# bins is a cli arg. Will likely need to increase as # accounts increase.

mvines commented 2 years ago

We could also just persist the index for local snapshots, as a separate artifact.

Rebuilding the index on boot is a major contributor to node startup time so any reductions will be welcome by all.

jeffwashington commented 2 years ago

I'm happy to brainstorm about this. And I'd love faster startup time. There will be issues. We have decoupled snapshot generation and hash calculation from the accounts index. This means that the index could be out of date relative to the append vecs in a snapshot. And, copying the index off while it is actively in use as well as maintaining 'old' entries that exist in snapshot append vecs while new roots are continuously being made and 'old' duplicates are cleaned introduces some race conditions, perf issues, locking issues, and state management that I'm confident we don't accurately handle right now. We also don't persist the in-mem portion of the disk buckets or the items that would be in-mem for perf reasons (such as lru). This feels like a can of worms. Maybe I'm being too pessimistic. Maybe we can come up with a brilliant insight.

mvines commented 2 years ago

hmm, I see. yes seems like we’d need a new method to snapshot the index as well.

abourget commented 2 years ago

We have done work to increase boot time by implementing what we called boot-snapshot, which keeps the previous snapshot, and upon reboot, continues exactly where it left off when shutdown (avoiding the need to redownload a snapshot, untar it, etc..).

The code is here: https://github.com/solana-labs/solana/compare/832cb76e45d4b43f15a95bdd25a60e3113c16bdc...streamingfast:fast-boot?expand=1

I'd love for this to be merged in some ways, and made to work. It worked for the most part, but there were some stability issues.

abourget commented 2 years ago

I can jump on a call any time to give an overview of the fast-boot design. Find me on the StreamingFast Discord, all our team is there, linked from streamingfast.io

HaoranYi commented 2 years ago

@abourget Nice work and good idea.

We have implemented a similar feature in the validator. The idea is similar to yours. https://github.com/solana-labs/solana/issues/23452

If you pass no-snapshot-fetch on the cli, and you have the the snapshot file locally, validator will skip cleaning and shrinking at startup. This will cut about half of the start up time.

Compare with your approach, the above one still have the cost of untaring, which I hope this work - https://github.com/solana-labs/solana/issues/24798 will reduce untaring time by a factor of 10. If we can achieve that, the total cost of untaring will be around 20s. But we will save the disk usage and avoid the extra store for the boot-snapshot.