Open pgulley opened 5 months ago
Off the top of my head, our Elasticsearch backup strategy:
We're using ES incremental snapshots.
Current policy: Backups to S3
every 2 weeks
S3
snapshots (or have we done it already)?Future policy: Backup to B2
every 2 weeks
B2
✅ S3
to B2
(July 15?)B2
B2
snapshots?cc @thepsalmist for any additional context/correction.
Created the B2 repositories mediacloud-elasticsearch-snapshot
, and made the first manual backup to B2
B2 backups failed and needed to be retried for this most recent period- eventually the upload succeeded. -per @kilemensi
Let's run a test restore of the ILM, using the B2 backups. Next step for paige is to look at the cost of this- and if they are exorbitant we can restore from S3 into an E2.
Elasticsearch's Restore API allows to perform various restorations from a snapshot, including Restoring an index, or restoring an entire cluster.
Restoring an Index - We did a validation of restoring a single index on the Staging ES instance. Even though the snapshot is taken for all the indices, we can individually restore a single index from the snapshot. To avoid deleting existing data, the restoration involved renaming the restored index
Restoring an entire cluster - We should be able to restore the entire cluster from the snapshots in the cases of catastrophic failures. We can restore an entire cluster or restore our snapshots to a different clusture. For our validation strategy, it would only be practical to restore from the snapshots to a different clusture.
The existing index stats are as follows
Index mc-search-000001
~ 3TB
"_shards": {
"total": 60,
"successful": 60,
"failed": 0
},
"_all": {
"primaries": {
"store": {
"size_in_bytes": 1604710273530,
"total_data_set_size_in_bytes": 1604710273530,
"reserved_in_bytes": 0
}
},
"total": {
"store": {
"size_in_bytes": 3209420547060,
"total_data_set_size_in_bytes": 3209420547060,
"reserved_in_bytes": 0
}
}
},
Index mc_search-000002 - 2TB
"_shards": {
"total": 60,
"successful": 60,
"failed": 0
},
"_all": {
"primaries": {
"store": {
"size_in_bytes": 1250744283074,
"total_data_set_size_in_bytes": 1250744283074,
"reserved_in_bytes": 0
}
},
"total": {
"store": {
"size_in_bytes": 2501488566148,
"total_data_set_size_in_bytes": 2501488566148,
"reserved_in_bytes": 0
}
}
},
index mc_search-000003 - 0.8TB
"_shards": {
"total": 60,
"successful": 60,
"failed": 0
},
"_all": {
"primaries": {
"store": {
"size_in_bytes": 417504383254,
"total_data_set_size_in_bytes": 417504383254,
"reserved_in_bytes": 0
}
},
"total": {
"store": {
"size_in_bytes": 835127786743,
"total_data_set_size_in_bytes": 835127786743,
"reserved_in_bytes": 0
}
}
},
To do any of the index's restore to a different cluster, we'd need minimum disk storage of 0.8TB
To do any of the index's restore to a different cluster, we'd need minimum disk storage of 0.8TB
Does ☝🏽 mean none of the current servers have such a capacity @thepsalmist?
To do any of the index's restore to a different cluster, we'd need minimum disk storage of 0.8TB
Does ☝🏽 mean none of the current servers have such a capacity @thepsalmist?
Yes none
Ok., so the options for a restore then are:
We just want to observe that the backups exist the way we expect