meltano / sdk

Write 70% less code by using the SDK to build custom extractors and loaders that adhere to the Singer standard: https://sdk.meltano.com
https://sdk.meltano.com
Apache License 2.0
88 stars 64 forks source link

AWS Authenticator class, possibly with AWS auth support in cookiecutter #1493

Open aaronsteers opened 1 year ago

aaronsteers commented 1 year ago

A cool idea from Matt in Slack: https://meltano.slack.com/archives/C01PKLU5D1R/p1678466724971049

I want to try building taps for AWS Glue, AWS QuickSight, and AWS Pinpoint (and maybe others) and had the thought that it could be really nice to have an extension of the SDK that provided the boto3 dependency and exposed a standard authentication configuration for AWS services.

We have an abstraction layer for authenticators in general, but to my knowledge we've never extended that metaphor for AWS services or other cloud services auth.

https://sdk.meltano.com/en/latest/reference.html#authenticator-classes

image

visch commented 1 year ago

I wonder if you can use https://github.com/davidmuller/aws-requests-auth with @edgarrmondragon 's recent auth changes that allow for the requests custom auth functions to be used

aaronsteers commented 1 year ago

@visch Yes, I was wondering something similar. There are (I think) two general paths here:

First is to use boto for auth, but the second is to use requests library authenticators like https://github.com/tedder/requests-aws4auth or https://github.com/davidmuller/aws-requests-auth which you reference.

From this SDK docs page, there's a reference to a custom requests authenticator:

https://sdk.meltano.com/en/latest/code_samples.html#use-one-of-requests-s-built-in-authenticators

In addition to requests.auth classes, the community has published a few packages with custom authenticator classes, which are compatible with the SDK. For example:

  • requests-aws4auth: AWS v4 authentication
  • requests_auth: A collection of authenticators for various services and protocols including Azure, Okta and NTLM.
aaronsteers commented 1 year ago

There are (I think) two general paths here

Important to call out these aren't mutually exclusive. Some use cases will probably need boto anyway, while some others might prefer to use requests, and (maybe??) some taps would want both? I can't think of use cases that would need both, but I can imagine some developers could prefer the requests library and others might want to use boto.

aaronsteers commented 1 year ago

For reference, here's some prior art built upon boto:

https://github.com/MeltanoLabs/tap-cloudwatch/blob/c83a222be106ac251af39fc2212b78a8b368af70/tap_cloudwatch/cloudwatch_api.py#L31-L64

From @pnadolny13's https://github.com/MeltanoLabs/tap-cloudwatch, also built on SDK. The auth implementation should generically work for other AWS services if refactored into a generic AWS auth class.

edgarrmondragon commented 1 year ago

I think the two could easily live together:

Now that we have an auth interface (i.e. any callable that accepts and returns a mutated prepared request), it might be time to start thinking of formalizing the connector interface too. That way folks would not even need to submit them to the SDK and could live as standalone packages. Later, we could still port them to be "officially" supported or just move them to our GitHub org.

pnadolny13 commented 1 year ago

I took a stab at implementing something like a boto authenticator in the DynamoDB tap that I'm working on. I made an authenticator class that manages authenticating a bunch of different ways depending on the inputs https://github.com/MeltanoLabs/tap-dynamodb/blob/main/tap_dynamodb/aws_authenticators.py. Then my dynamo implementation https://github.com/MeltanoLabs/tap-dynamodb/blob/main/tap_dynamodb/dynamo.py just simply access the clients it needs without worrying about how to auth with them i.e. self.resource.

The challenges I commonly see are with taps/targets having varying support for auth configs (i.e. keys, session token, profile, config/credentials, environment variables, etc.) then on top of that users sometime wants custom endpoint_url while using resources that they're mocking with localstack. Also for our use case I'll need to assume another role which I havent seen anywhere but I've added it here. I put some comments on my opinions around handling configs vs env vars vs etc. in https://github.com/MeltanoLabs/tap-dynamodb/pull/3#issue-1662996922 (I've refactored again since then but the comments still stand). The TDLR is that I'd love to require the configs to be explicit to some degree. I've seen weird behavior when a tap finds credentials in my env vars or a default config file on my machine, so I think we can avoid that by being explicit.

It would be cool if we could inject all of these aws config options into the tap automatically when the authenticator is in use similar to stream_maps/stream_map_config/etc.

s7clarke10 commented 1 year ago

I am using the AWS Authenticator (AWS4AUTH). Here is a link to a working example with the Meltano SDK.

https://github.com/s7clarke10/tap-rest-api-msdk/blob/e1a302a083db99487e21cb21540739829990c339/tap_rest_api_msdk/auth.py#L231-L239

stale[bot] commented 2 weeks ago

This has been marked as stale because it is unassigned, and has not had recent activity. It will be closed after 21 days if no further activity occurs. If this should never go stale, please add the evergreen label, or request that it be added.

s7clarke10 commented 1 week ago

Just referencing the AWS Auth that I implemented in tap-rest-api-msdk if we want to bake this into the Meltano SDK.

Calling the AWS Authenticator : https://github.com/Widen/tap-rest-api-msdk/blob/f4eeb54446f181336b7c34b25821ce23b3cefeb5/tap_rest_api_msdk/auth.py#L254-L265

Definition for the AWS Authentication class: https://github.com/Widen/tap-rest-api-msdk/blob/f4eeb54446f181336b7c34b25821ce23b3cefeb5/tap_rest_api_msdk/auth.py#L17-L114