Improved State Saving - Githubissues

jasperpotts commented 1 year ago

Problem

There are a number of things we can do to make State Saving lighter weight for Node effort, disk space and network upload cost.

The background is every node creates a saved state every 15min that gets uploaded to backup buckets. The saved state is used for two purposes:

Node Restart - this has requirements to be as fast as possible to load
Backups - this wants to be as light as possible, to minimize effort spent in CPU, disk and network

Today one code path is used for both and write all data to disk,

Serialized Merkle Nodes
Virtual Map DBs
- Database file hard links, hashes, leaves and half disk hash maps
- In memory index dumps All of these files get uploaded. There is more saved and uploaded than needs to be.

Solution

Reduce what is saved is saved states for backup while still keeping enough saved for restarts to be fast.

Fix database to rebuild index and hashes if missing
Change uploading to not upload hashes and in memory indexes. Maybe can be done by filter file names to be uploaded. This will depend on https://github.com/hashgraph/hedera-services/issues/5180 for leaf hashes being in separate files with internal hashes. This will be problematic for state validation tool. Could fix by uploading a hash of indexes rather than indexes them self.
Optimize upload to not upload DB files that have been previously uploaded. Because DB files are immutable then if we have uploaded map X file 17 then we do not need to upload file 17 over and over. We will ideally need a server side improvement for this to create symbolic links for files not copied. Maybe that could be done with uploading tar files or some other container that can carry links.
Optimize DB Compaction and VirtualNodeCache Flushing to minimize file churn, at the moment all data is rewritten at least once a day. Also we flush to many small files that get rolled up multiple times a day. This can be tuned with the aim of minimizing the new data files created that need to be uploaded.
Compress uploads, there is a substantial file size saving of between 2x and 10x to be gained on saved states by compressing them with Zstandard compression. Ideally as we stream from node to backup server we will stream compress so the resulting files are compressed on wire and in storage at remote end. Zstandard can be more efficient CPU wise if we pre-build a dictionary, so because each set of files contains similar data we can train compression dictionaries for each set of files and use it for all new files in that set with high efficiency.
Maybe? Optimize state saving to only write indexes to disk when doing a freeze for restart/upgrade if the cost of extra time during disaster recovery is acceptable.

Having partial data in saved states will reduce storage and upload costs substantially but increase risk of bad data. To mitigate that risk we can depend on the State Validation Tool to validate each uploaded state is complete and hashes correctly.

### Tasks
- [ ] https://github.com/hashgraph/hedera-services/issues/7396
- [ ] https://github.com/hashgraph/hedera-services/issues/7540
- [ ] https://github.com/hashgraph/hedera-services/issues/7502
- [ ] https://github.com/hashgraph/hedera-services/issues/7529
- [ ] https://github.com/hashgraph/hedera-services/issues/7525
- [ ] https://github.com/hashgraph/hedera-services/issues/7501
- [ ] https://github.com/hashgraph/hedera-services/issues/7499
- [ ] https://github.com/hashgraph/hedera-services/issues/7498
- [ ] https://github.com/hashgraph/hedera-services/issues/8795
- [ ] https://github.com/hashgraph/hedera-services/issues/9825

jasperpotts commented 1 year ago

The idea of "Change uploading to not upload hashes and in memory indexes.". The remove memory indexes has pros and cons:

Pros
- Saves a lot of bandwidth to upload
- Saves a lot of storage space
Cons
- Validator needs indexes to be able to validate indexes are valid and no database corruption has happened

It seems the to start with we can hold removing indexes till size becomes a concern. If it does then there are two options that would allow us to have some level of safety.

a middle ground is we upload indexes less often than every 15min, like once a day or so
Only up load from a few nodes, doesn't protect all nodes from advantages.

If the idea that we have decentralized storage/archival severs next to each node for storing streams and saved states then validator could run on there and delete indexes after it has validated them. If it is on local network to server then the upload cost is much reduced.

Ummm.... 🤔 maybe there is a better idea... what if we hashed the index and upload hash rather than complete index. On validator we can rebuild a new index, hash it and check hashes match.

artemananiev commented 1 year ago

One more idea: run DB compaction on a snapshot before uploading it to the cloud. Since snapshots and compactions are totally independent today, saved state may contain many files that are written to disk recently, but not yet compacted. It means, such states will contain fair amount of garbage (exact numbers may be found in the validation tool). No need to upload it.

imalygin commented 1 year ago

Here is the document that sums up the compaction improvements to be made as a part of this epic: https://www.notion.so/swirldslabs/Compaction-Improvements-247726614d924fbaa34aa82a157a2f20

hashgraph / hedera-services

Improved State Saving #5826

Problem

Solution

Ummm.... 🤔 maybe there is a better idea... what if we hashed the index and upload hash rather than complete index. On validator we can rebuild a new index, hash it and check hashes match.