transferwise / pipelinewise-target-redshift

Singer.io Target for Amazon Redshift - PipelineWise compatible
https://transferwise.github.io/pipelinewise/
Other
12 stars 65 forks source link

Fix for ECS credentials #31

Closed Limess closed 4 years ago

Limess commented 4 years ago

Fix ECS credentials not being correctly picked up by boto3 when no keys are provided.

This is a fix I've applied in our production environment in order to use ECS credentials rather than AWS keys. Without it boto3 could not find the credentials as expected.

I'm unsure why this is necessary - I've looked through the boto3 credentials code and I can't see anything which would mean that None values for the keys would result in ECS credentials not being used, and I think EC2 instance credentials are picked up without this change (hence not including this tweak in the original change I made).

paul-english commented 4 years ago

This is the exact change we need to be able to use pipelinewise. Would love to see this merged.

paul-english commented 4 years ago

@Limess are you using a pipeline that uses fastsync?

For me there's still a configuration validation error, and fastsync still requires the aws_access_key_id and the aws_secret_access_key in the configuration.

https://github.com/transferwise/pipelinewise/blob/master/pipelinewise/fastsync/mysql_to_redshift.py#L28-L29 https://github.com/transferwise/pipelinewise/blob/master/pipelinewise/fastsync/postgres_to_redshift.py#L28-L29 https://github.com/transferwise/pipelinewise/blob/master/pipelinewise/fastsync/commons/target_redshift.py#L13-L17

I may open a separate MR to make fastsync match the behavior of this MR.

Limess commented 4 years ago

No we are not using fastsync, we're actually using this target outside of pipelinewise.

koszti commented 4 years ago

thanks for the PR and the problem reported in fastsync.

@log0ymxm there is a pending PR that addresses the same problem in fastsync-redshift at https://github.com/transferwise/pipelinewise/pull/285. Once #32 got merged we'll release a new pipelinewise-target-redshift to PyPI and will bump the version in the main pipelinewise to be everything in sync

koszti commented 4 years ago

pipelinewise-target-redshift 1.2.0 with the fix is now available on PyPI at https://pypi.org/project/pipelinewise-target-redshift/

Once https://github.com/transferwise/pipelinewise/pull/285 is merged (very soon) then it will be available in pipelinewise as well.

Limess commented 4 years ago

Sigh, this still isn't working as expected for me - it was working in my fork previously: https://github.com/Limess/pipelinewise-target-redshift/blob/do-not-pass-creds/target_redshift/db_sync.py#L218-L223 where we were still using boto3.client('s3', ...).

I'm still seeing this:

[2019-12-05 12:11:13,798] {{bash_operator.py:128}} INFO - boto3.exceptions.S3UploadFailedError: Failed to upload /tmp/tmp5ysju26l.csv.gz.1 to analytics-redshift-sink.my-bucket/__tmp/pipelinewise_public-my-stream_20191205-121113-233926.csv.gz.1: An error occurred (InvalidAccessKeyId) when calling the PutObject operation: The AWS Access Key Id you provided does not exist in our records.

when omitting credentials

I'll look into the differences - I rebased with the latest master and I believe the boto3 session handling changed slightly so I matched it before making this PR as it looked innocuous, I believe it may be something to do with boto3 clients vs session.

Sorry, for the repeated PRs for this, I'd like to get this fixed but it's proving more difficult than expected 🤷‍♂

koszti commented 4 years ago

no problem. boto3 is currently pinned to 1.9.188 which not too old but far from the latest. feel free to send PR once you found something.