nasa / podaacpy

A python utility library for interacting with NASA JPL's PO.DAAC
http://podaacpy.readthedocs.org/en/latest/
Apache License 2.0
73 stars 45 forks source link

Add functionality to push data products to cloud storage #121

Open lewismc opened 6 years ago

lewismc commented 6 years ago

Some functions for the associated services have a path='' meaning that the user can download the data to wherever they want on the local machine.

This issue looks to allow s3 paths such that the data can be sent to s3 for analysis.

swatisingh45 commented 6 years ago

Hey ! can I take up the issue?

lewismc commented 6 years ago

Hi @swatisingh45 yes please. The idea would be to add a new parameter to both def granule_subset(self, input_file_path, path='') and extract_l4_granule(self, dataset_id='', path='') to essentially include a boolean flag to persistence in s3. The new function signatures would then look something like

extract_l4_granule(self, dataset_id='', store='local', path='')
...
granule_subset(self, input_file_path, store='local', path='')

By default the storage device would be 'local' disk however the possible options would be both 'local' and 's3'.

When using s3 we should introduce a config.properties file which essentially contains key values representing the AWS configuration e.g. username and password. This file could be read when the user create an instance of Podaac().

Regarding the code for uploading files to s3, you can base it on the following example

import boto
import boto.s3
import sys
from boto.s3.key import Key

AWS_ACCESS_KEY_ID = ''
AWS_SECRET_ACCESS_KEY = ''

bucket_name = AWS_ACCESS_KEY_ID.lower() + '-dump'
conn = boto.connect_s3(AWS_ACCESS_KEY_ID,
        AWS_SECRET_ACCESS_KEY)

bucket = conn.create_bucket(bucket_name,
    location=boto.s3.connection.Location.DEFAULT)

testfile = "replace this with an actual filename"
print 'Uploading %s to Amazon S3 bucket %s' % \
   (testfile, bucket_name)

def percent_cb(complete, total):
    sys.stdout.write('.')
    sys.stdout.flush()

k = Key(bucket)
k.key = 'my test file'
k.set_contents_from_filename(testfile,
    cb=percent_cb, num_cb=10)

Thank you for taking this issue on, if you have any issues then please let me know.

lewismc commented 6 years ago

@swatisingh45 are you working on this? If not then I will do it, thank you.

lewismc commented 4 years ago

Using Apache LibCloud's Python Object Storage API might be a good idea here.