hashgraph / hedera-services

Crypto, token, consensus, file, and smart contract services for the Hedera public ledger
Apache License 2.0
282 stars 124 forks source link

Improved State Saving #5826

Open jasperpotts opened 1 year ago

jasperpotts commented 1 year ago

Problem

There are a number of things we can do to make State Saving lighter weight for Node effort, disk space and network upload cost.

The background is every node creates a saved state every 15min that gets uploaded to backup buckets. The saved state is used for two purposes:

Today one code path is used for both and write all data to disk,

Solution

Reduce what is saved is saved states for backup while still keeping enough saved for restarts to be fast.

Having partial data in saved states will reduce storage and upload costs substantially but increase risk of bad data. To mitigate that risk we can depend on the State Validation Tool to validate each uploaded state is complete and hashes correctly.

### Tasks
- [ ] https://github.com/hashgraph/hedera-services/issues/7396
- [ ] https://github.com/hashgraph/hedera-services/issues/7540
- [ ] https://github.com/hashgraph/hedera-services/issues/7502
- [ ] https://github.com/hashgraph/hedera-services/issues/7529
- [ ] https://github.com/hashgraph/hedera-services/issues/7525
- [ ] https://github.com/hashgraph/hedera-services/issues/7501
- [ ] https://github.com/hashgraph/hedera-services/issues/7499
- [ ] https://github.com/hashgraph/hedera-services/issues/7498
- [ ] https://github.com/hashgraph/hedera-services/issues/8795
- [ ] https://github.com/hashgraph/hedera-services/issues/9825
jasperpotts commented 1 year ago

The idea of "Change uploading to not upload hashes and in memory indexes.". The remove memory indexes has pros and cons:

It seems the to start with we can hold removing indexes till size becomes a concern. If it does then there are two options that would allow us to have some level of safety.

If the idea that we have decentralized storage/archival severs next to each node for storing streams and saved states then validator could run on there and delete indexes after it has validated them. If it is on local network to server then the upload cost is much reduced.

Ummm.... 🤔 maybe there is a better idea... what if we hashed the index and upload hash rather than complete index. On validator we can rebuild a new index, hash it and check hashes match.

artemananiev commented 1 year ago

One more idea: run DB compaction on a snapshot before uploading it to the cloud. Since snapshots and compactions are totally independent today, saved state may contain many files that are written to disk recently, but not yet compacted. It means, such states will contain fair amount of garbage (exact numbers may be found in the validation tool). No need to upload it.

imalygin commented 1 year ago

Here is the document that sums up the compaction improvements to be made as a part of this epic: https://www.notion.so/swirldslabs/Compaction-Improvements-247726614d924fbaa34aa82a157a2f20