Open ChrisTheDBA opened 3 years ago
This copies everything in a bucket. If we only need a subset of files in the bucket, perhaps we put that subset in a separate bucket, or add filtering here.
I tested this on my own S3 storage and an IAM user with AmazonS3ReadOnlyAccess.
$ pip install boto3
import boto3
from pathlib import Path
BUCKET_NAME = "nc-campaign-finance-storage"
LOCAL_DIR = Path.cwd() / 'data'
s3_resource = boto3.resource('s3')
bucket = s3_resource.Bucket(BUCKET_NAME)
for obj in bucket.objects.all():
s3_file = obj.Object()
local_file = LOCAL_DIR / s3_file.key
if local_file.exists():
if local_file.stat().st_size == s3_file.content_length:
print(f'{s3_file.key} already downloaded')
continue
local_file.parent.mkdir(parents=True, exist_ok=True)
s3_file.download_file(str(local_file))
print(f'{s3_file.key}')
print("Done")
The change needs to be dynamic to download any and all files not already located in the docker image(a static list of files is not sufficient) and should require elevated privileges requiring AWS secrets.
I'd like to take this issue. Boto3 looks straightforward, but I'll need credentials for an S3 user with "programmatic access".