singer-io / tap-s3-csv

GNU Affero General Public License v3.0
27 stars 52 forks source link

read csvs with AWS-standard client-side encryption #11

Open seamusabshere opened 6 years ago

seamusabshere commented 6 years ago

it would be really cool if i could store a secret key in my Stitch Data integration and then have this tap decrypt files transparently

from the official AWS ruby SDK: [1] (edited for clarity)

# just a random secret for now, but you get the idea
require 'openssl'
key = OpenSSL::PKey::RSA.new(1024)

# encryption client
s3 = Aws::S3::Encryption::Client.new(encryption_key: key)

# round-trip an object, encrypted/decrypted locally
s3.put_object(bucket:'aws-sdk', key:'hipaa.csv', body:'lots,of,health,data')
s3.get_object(bucket:'aws-sdk', key:'hipaa.csv').body.read
#=> 'lots,of,health,data'

# reading encrypted object without the encryption client
# results in the getting the cipher text
Aws::S3::Client.new.get_object(bucket:'aws-sdk', key:'hipaa.csv').body.read
#=> "... cipher text ..."

There is apparently a port of this to Python [1] but its example is significantly less clear, so I won't mention it, even though it's probably what you would want to use since taps are written in python.

Key things:

[1] https://docs.aws.amazon.com/sdk-for-ruby/v3/api/Aws/S3/Encryption.html [2] https://github.com/boldfield/s3-encryption (see issue https://github.com/boldfield/s3-encryption/issues/9 for a slight clarification)

nick-mccoy commented 6 years ago

Hi @seamusabshere, that's an interesting idea! If you want to make the changes locally, test, and submit a PR, we would consider merging it and adding the corresponding field on the integration's settings page.

aaronsteers commented 4 years ago

I arrived here because I'm actually interested in adding support for KMS encryption on the target side, for target-s3-csv. I think it's a great addition if both can support server-side encryption. I'll post back here if I have updates on that front.

For reference, I did find this link, although primarily focused on KMS: https://www.justdocloud.com/2018/09/21/upload-download-s3-using-aws-kms-python/

aaronsteers commented 4 years ago

I've created a related Issue on the pipelinewise target-s3-csv repo here: https://github.com/transferwise/pipelinewise-target-s3-csv/issues/5

I imagine the code to accomplish both is very similar, and would be great if the settings/config needed on both side are similar or identical.

UPDATE:

After further research, I've found that KMS decryption occurs transparently as long as the user has access to the applied KMS key. In that case, we probably can accomplish KMS integration without any change to this tap (feel free to correct me if that doesn't seem correct).

https://aws.amazon.com/premiumsupport/knowledge-center/decrypt-kms-encrypted-objects-s3/