Open LucaCinquini opened 1 year ago
@LucaCinquini might be good for us to define some backup parameters / constraints.
For example, how does the following sound?
@riverma : good idea. Those parameters are ok with me. To save money, if necessary, we could also backup every 24 hours and keep the backups for 7 days.
Backup procedure documented here: https://wiki.jpl.nasa.gov/display/operasds/ElasticSearch+Backup+and+Restoration
Tested with GRQ and Mozart by (a) backing up all ES docs, (b) purging all documents from each cluster, and (c) restoring all documents. Confirmed the document count matched before step (a) and after step (c).
Looks good @niarenaw! Please make sure to use the new template that @LalaP set up. See the OPERA SDS OPS PROCEDURES main page for a link to creating a template wiki page from scratch.
I also had a look, thanks for testing and documenting Nick. May I suggest that others need to test this procedure - perhaps Lala and Sri (separately) after a successful completion of a regression test, so that the Elasticsearch indices are populated? We also probably need to setup a cron job to backup these indices every 24 hours.
I've updated the procedure to abide by the Ops Procedure template. A nightly backup should be pretty easy to add as a cron. I think it probably makes more sense to store these in s3 rather than on mozart to avoid any additional need for cleanup/disk space monitoring. Maybe we set up a new bucket and add a 14 day retention period as a lifecycle rule? or 30 days?
+1 @niarenaw to a lifecycle rule. Let's discuss the details for this.
Checked for duplicates
Yes - I've already checked
Alternatives considered
Yes - and alternatives don't suffice
Related problems
No response
Describe the feature request
We need a reliable procedure to backup and restore the state of the SDS Elasticsearch clusters - Mozart, GRQ and possibly Metrics. Please use either the Dev Common or one of the I&T venues, checking with other developers to make sure they are not currently using it.