Closed rahulbot closed 9 months ago
Notes from mtg: sounds doable via an API call (to test), perhaps use 90 days as rough time limit, validate how easy/hard it is to restore an archived index, double-check shard size spec, make sure changes for this don't require re-indexing
We can do the ILM policy update via the ILM put API PUT _ilm/policy/<mc_ILM_policy_id>
.
When the policy is updated, changes what take effect on our current index mc_search-0001
, so this means changes would take effect after mc_search-0002
. So we'll have to let the current policy rollover as per the current ILM rollover definitions as documented here
As per this image, our current max shard size is 17.5GB, so we should anticipate rollover when we ingest about triple our current data (this should be sooner than the alternate rollover action of 365 days)
My proposal for setting up backups is:
Closing this and I'll set up different issues accordingly to capture these two tasks to be done at different times.
This supports a two-prong an overall strategy for catastrophic index failure recovery:
With the ILM ES index architecture, @philbudne raised a question about reconsidering our redundancy approach. We now know that we can restore 2-3 months from WARC files in ~2 days. What if we roll-over via ILM to a new index every 2 months, and immediately backup the rolled-over index off-site. Then if we crash restoration is 2-ish days of downloading indexes and recreating the latest (un-backed up) index from .WARC files. I think this is an acceptable downtime, and we can always later add some kind of "hot" duplicate of the latest index if we want. The task here is to consider how to design and implementation for this, whether it would really work, and to make sure it is a good idea.
Related to #157, #231, #54.