Closed paul-english closed 4 years ago
@koszti Do you know if that test will passes if aws_redshift_copy_role_arn
is provided in the config? If so, the statement credentials = aws_session.get_credentials().get_frozen_credentials()
might still be needed when we aren't using an arn for the copy sql. I apologize I don't know enough about boto3, but I think calling that is what leads to #36. @Limess does this seem correct?
Looks like there might be a weird race condition in terms of calling aws_session.get_credentials().get_frozen_credentials()
prior to initialising an S3 client then?
Maybe we should conditionally call aws_session.get_credentials().get_frozen_credentials()
immediately prior to the COPY
call (boto3 caches credentials while they're valid so this should be fine and automatically refreshes them when calling get_credentials()
if necessary) if no role is provided, and also use the session token from credentials in the COPY
command if it exists.
The credentials will be obtained from AWS STS via an assume role call so will be temporary and require the token (token
in the returned credentials object: https://www.pydoc.io/pypi/botocore-1.7.0/autoapi/credentials/index.html#credentials.Credentials).
We should probably persist the aws_session
on the class rather than the configuration used to create it.
@log0ymxm , @Limess with the change below I could make every test passing and seems it's working as expected. The change doesn't let boto3 to get the supported environment variables by their own but it passes the aws_access_key_id
, aws_secret_access_key
and aws_session_token
from config.json or from environment variables.
Every test case passes and I manually tested the following scenarios:
aws_access_key_id
and aws_secret_access_key
in config.json
AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
environment variablesaws_access_key_id
, aws_secret_access_key
and aws_session_token
in config.json
AWS_ACCESS_KEY_ID
, AWS_SECRET_ACCESS_KEY
and AWS_SESSION_TOKEN
environment variables(Temp STS credentials requested by CLI: aws sts get-session-token --duration-seconds 900
)
Can you please let me know if this works for you. If yes, then please update your PR and I'm happy to merge and sync it to pipelinewise-redshift-fastsync.
Changes in db_sync.py
aws_access_key_id = self.connection_config.get('aws_access_key_id') or os.environ.get('AWS_ACCESS_KEY_ID')
aws_secret_access_key = self.connection_config.get('aws_secret_access_key') or os.environ.get('AWS_SECRET_ACCESS_KEY')
aws_session_token = self.connection_config.get('aws_session_token') or os.environ.get('AWS_SESSION_TOKEN')
# Init S3 client
# Conditionally pass keys as this seems to affect whether instance credentials are correctly loaded if the keys are None
if aws_access_key_id and aws_secret_access_key:
aws_session = boto3.session.Session(
aws_access_key_id=aws_access_key_id,
aws_secret_access_key=aws_secret_access_key,
aws_session_token=aws_session_token
)
credentials = aws_session.get_credentials().get_frozen_credentials()
# Explicitly set credentials to those fetched from Boto so we can re-use them in COPY SQL if necessary
self.connection_config['aws_access_key_id'] = credentials.access_key
self.connection_config['aws_secret_access_key'] = credentials.secret_key
self.connection_config['aws_session_token'] = credentials.token
else:
aws_session = boto3.session.Session()
I'd still look to move the get_credentials() call to immediately before the Redshift COPY command as it will trigger a refresh if boto3 is getting them from instance/ECS credentials
@koszti the change looks to still work for me, I've made those updates in my branch. @Limess I haven't made your suggested change.
@koszti is there any update on this PR needed, do we need the updates @Limess is suggesting?
this is merged and released to pypi as pipelinewise-target-redshift 1.2.1. Further enhancements can be released by separated PRs.
@log0ymxm I synced these changes to pipelinewise-redshift-fastsync (here) but please note we can't merge new PRs to the main pipelinewise repo until 6th Jan so still need to wait a few more days but it will arrive in the very beginning of the new year.
Should be a fix for https://github.com/transferwise/pipelinewise-target-redshift/issues/36