One orchestrator snapshot script will orchestrate the snapshotting process. It will take in configured IPs and snapshot intervals as a JSON file
Triggered via cronjob every snapshot interval, the commander script will execute sanity checks (i.e: liveliness) on all snapshot IPs, then it will snapshot the beacon nodefirst as follows:
Backup existing rclone credentials
Install snapshot credentials
Stop the harmony service
Rsync locally
Start a thread to rsync to the bucket
Restart the harmony service
Restore the rclone credentials
Once the beacon node is restarted (but rsync to bucket still in progress), the orchestrator script will snapshot all aux node in parallel using the same steps.
The orchestrator script will exit successfully once all bucket rsyncs are done. Otherwise, it will log the error and exit with the exception (ungracefully).
Snapshotted aux-shard db will now have clean crosslinks w.r.t the snapshotted beacon db.
No special initial installation required on any of the machines to snapshot.
This closes https://github.com/harmony-one/harmony/issues/2998
Snapshot design
TODO