nkrumm / asd-jre-public

Code + Pipelines for the NDAR ASD pipelines
0 stars 0 forks source link

Replace S3cmd with aws toolkit #1

Open nkrumm opened 9 years ago

nkrumm commented 9 years ago

The official aws command line tools are much faster and more reliable than s3cmd.

Since this still requires an external subprocess call-out, the best way to do this will be to wrap the commands in a s3_cp() command, part of the starpipe class.

Necessary commands will be:

obenshaindw commented 9 years ago

The other advantage to aws-cli is that we can add multiple profiles to the config. https://github.com/NDAR/ndar_toolkit/blob/master/ndar_update_keys.py creates a specific aws-cli profile for NDAR using creds from the command-line download manager.

You might also want to have a look at this. https://github.com/NDAR/NITRC-Pipeline-for-NDAR/tree/master/ndar_unpack

Imports boto and uses it for direct s3 access, which avoids another component and sub-process call. On the other hand aws-cli and s3cmd add a bunch of extra niceties like error handling. I don't know how the performance is with just using boto directly vs. one of these alternate tools.