Closed peterbe closed 5 years ago
Note-to-self; there are 59,591,797 in net-mozaws-prod-delivery-firefox
and 400,000 keys in net-mozaws-prod-delivery-archive
. That second number looks oddly rounded. Hmm...
In list_objects_v2
you can download 1,000 keys per request. That means you have to download 59,000 of these pages. I'm currently attempting this locally on my home network and I'm up to 2,800 pages now after half an hour.
Since things are not sorted, this technique is bound to take a very very long time.
If we do use this, it's only a matter of time till we get some sort of network outage in the middle of those 59,000 pages. According to the docs there is a "StartAfter" string. Perhaps if we dump every last key per file to disk, we can then resume from there. Needs to be tested.
I analyzed all the keys in net-mozaws-prod-delivery-archive
by printing about 1% randomly. The keys are things like this:
Note that they appear to be roughly in order and the last one is Jan 2014.
There are two buckets that we should use when using the backfill script:
net-mozaws-prod-delivery-firefox
net-mozaws-prod-delivery-archive
I think the region is "us-east-1" for both of these.