Open rahulbot opened 4 months ago
Today I:
I tried creating a public mediacloud-public bucket, but the API call failed with an error that the account email address had not been verified.
Regarding WARC files:
There are about 10K WARC files, taking up 1.8TB on ramos (November 2023 thru early March 2024). There are about 75K WARC files in the S3 mediacloud-indexer-archive bucket taking about 13TB
So we could be talking about $1000 to transfer the WARC files we don't have locally
Now writing new current-day WARC files to both B2 and S3
I tried creating a public mediacloud-public bucket, but the API call failed with an error that the account email address had not been verified.
@philbudne I was able to poke around the settings page and verify my email. Please test again at your convenience and let me know if still fails.
Did a bit of googling on how to set ES to use a specific S3 API URL for backblaze:
https://github.com/elastic/elasticsearch/issues/21283#issuecomment-828002399
B2 has S3 compatible API. It works fine for us. We are using a snapshot like this:
{
"type": "s3",
"settings": {
"bucket": "elastic-backup",
"region": "",
"endpoint": "s3.us-west-001.backblazeb2.com"
}
}
In our case the endpoint would be s3.us-east-005.backblazeb2.com
I've broken out the task of "closing s3 writes" into a new issue (#316)- I'll leave this as a reference to the longer-term task of extracting data from s3 once we're no longer writing to it.
Following up on #270, we want to continue migrating backups from S3 to B2. This should include:
2024-06-26: All production stacks (daily, 2022 csv and 2022 rss) are writing to both S3 and B2