mediacloud / story-indexer

The core pipeline used to ingest online news stories in the Media Cloud archive.
https://mediacloud.org
Apache License 2.0
1 stars 4 forks source link

move legacy DB snapshots to EBS archive? #255

Closed rahulbot closed 6 months ago

rahulbot commented 7 months ago

Almost 60% of our last month's AWS bill was EBS volumes. I think those are snapshots of the legacy Postres DB. Can we move those to EBS Snapshots Archive to reduce that ongoing cost? I think at some point we'll be able to delete them, but for right now we still want them in our back pocket. Any thoughts on if this would help @kilemensi and @thepsalmist ?

rahulbot commented 7 months ago

Details on $5,600 Jan bill:

philbudne commented 7 months ago

My recall of AWS "cold storage" is that it almost always comes with a minimum payment/duration (if you restore sooner than the minimum duration, you still have to pay for the entire minimum duration?), so I would suggest asking the researchers (especially in light of their need for counts in 2018 data) about their near term needs.

rahulbot commented 7 months ago

The DB snapshots are 71TB on EC2. At the quoted EBS archive tier rate of $0.0125 per GB-month keeping those DB snapshots around would cost us $887.5 if archived (current cost $3,500 @ $0.05/GB-month standard tier rate). This is a big reduction of cost and I think worth it for keeping them around while we restore data. Any objections? If not then we should do this via the AWS console.

Relevant notes from their docs:

rahulbot commented 7 months ago

Agreement to delete the non-database snapshots and move the database snapshots to EBS archive.

thepsalmist commented 7 months ago

Done, moved all the Postgres and Solr snapshots to Archive storage tier