mediacloud / story-indexer

The core pipeline used to ingest online news stories in the Media Cloud archive.
https://mediacloud.org
Apache License 2.0
2 stars 5 forks source link

Ft/es snapshots #267

Closed thepsalmist closed 7 months ago

thepsalmist commented 8 months ago

This PR implements taking Elasticsearch snapshots to S3. Implementation uses ES's Snapshot Lifecycle Management (SLM) API to automate snapshot creation every 14 days. Snapshots created are incremental and therefore should allow us have a point-in-time restore point for every 2 weeks of ES data.

PS: Removed the dependency of ILM 90-day rollover.

thepsalmist commented 8 months ago

... API to automate snapshot creation every 91 days (Rollover should be every 90 days)

My immediate thoughts/questions (may not be accurate):

  1. Are we 100% sure that the two policies we be schedule at the exact same time? Otherwise we can't guarantee that 91st day (for snapshot) will be a day after the 90th day (for rollover).
  2. The two schedules even if they start at the same time, will drift over time:

Rollover: 90th day, 180th day, ... 900, 990, ... Snapshot: 91st day, 182th day, ... 910, 1001, ...

I think you want them to happen every 90 days but one should start a day earlier/later

  1. Are the snapshots incremental?

Yes the snapshots are incremental

Yes, the scheduling is a little bit tricky. Since the count starts on the day of deployment to prod. These would actually overlap since initial rollover would have happened by then. Looking at this!

philbudne commented 8 months ago

Seeing this comment:

Are we 100% sure that the two policies will be schedule at the exact same time? Otherwise we can't guarantee that 91st day (for snapshot) will be a day after the 90th day (for rollover).

I had thought there was some way to trigger actions (like snapshotting) as part of ILM, and that scheduling wouldn't be an issue...

thepsalmist commented 8 months ago

Seeing this comment:

Are we 100% sure that the two policies will be schedule at the exact same time? Otherwise we can't guarantee that 91st day (for snapshot) will be a day after the 90th day (for rollover).

I had thought there was some way to trigger actions (like snapshotting) as part of ILM, and that scheduling wouldn't be an issue...

Yes Ideally, had we gone with the full ILM phase rollovers from hot->warm->cold/frozen snapshotting would automatically be done once we rollover int the cold/frozen phase. So right now with all our indices in the hot phase, we're trying to achieve this using Snapshot Lifecycle Management (SLM) APIs

kilemensi commented 7 months ago

@thepsalmist Can you add a small description (in this PR or docs) what this change mean vs what we were aiming for with the 90-day rollover?

thepsalmist commented 7 months ago

@thepsalmist Can you add a small description (in this PR or docs) what this change mean vs what we were aiming for with the 90-day rollover?

Resolved