mediacloud / story-indexer

The core pipeline used to ingest online news stories in the Media Cloud archive.
https://mediacloud.org
Apache License 2.0
2 stars 5 forks source link

review and act on S3 cleanup #264

Closed rahulbot closed 8 months ago

rahulbot commented 8 months ago

Split off from number #260. We have a lot of buckets in S3 that probably be deleted to reduce cost and do some good spring cleaning. I made a list on a spreadsheet ("MC S3 Feb 2024 Audit") shared on Slack. This issue is to track any questions about the proposed actions, which include keeping, or deleting, and then to act on the proposed actions, once we have agreement that they make sense and don't create any problems for ongoing project functioning.

@kilemensi + @philbudne : please also review the spreadsheet and chime in with any notes on here or Slack.

thepsalmist commented 8 months ago

Since Backfill of 2023 is complete, then we can delete the respective S3 buckets with 2023 CSV files

philbudne commented 8 months ago

Unless the CSV files are large, I think it could be useful to have. Could replace with .tar.gz file?

thepsalmist commented 8 months ago

All S3 buckets reviewed and actioned as per the Excel file, can close