nchammas / flintrock

A command-line tool for launching Apache Spark clusters.
Apache License 2.0
637 stars 116 forks source link

Unable to load AWS credentials from any provider in the chain #213

Closed DBCerigo closed 6 years ago

DBCerigo commented 6 years ago

Hi,

This may well be a spark issue which doesn't concern flintrock, but it is something that becomes a bit more tricky in flintrock.

I'm trying to use the ID and secret key of an IAM user to authentic pulling data into a cluster from s3. I've confirmed that my ID and secret work when used like sqlCtx.read.csv('s3a://<ID>:<SECRET>@bucket/file.txt).

But I get the Unable to load AWS credentials from any provider in the chain if trying to load the ID and secret dynamically. Ways I've tried getting them to load, all resulting in same mentioned error; • adding ~/.aws/credentials (and confirming it's found by aws cli using aws configure) to master and slaves • running !export AWS_ACCESS_KEY_ID=..., and for secret, in Jupyter notebook that is connected to the spark cluster

Suggestions on how you solved this appreciated. Note: I want to rely solely on the ID and secret key as working in a team and want to easily share bucket access.

Thanks

nchammas commented 6 years ago

Spark doesn't look at ~/.aws/credentials and probably not at the AWS_* environment variables either.

What you need is to use the fs.s3* properties described here, and if you're using Hadoop 2.7+ you should specifically use s3a as opposed to s3n or just s3. I think the settings go in hadoop/conf/core-site.xml.

Better than managing credentials, though, would be if you used IAM roles as described in the README. Then you don't need to specify secrets anywhere.

DBCerigo commented 6 years ago

In the end I guessed it must not be - though this in combination with

com.amazonaws.AmazonClientException: Unable to load AWS credentials from any provider in the chain
at com.amazonaws.auth.AWSCredentialsProviderChain.getCredentials(AWSCredentialsProviderChain.java:131)

seemed to imply that it should be looking in there which keep me trying for longer - maybe it's not because it's not DefaultAWSCredentialsProviderChain? Anyway!

Do you have a suggestion how to either automatically inject those xml config files, or to "bake" them into the ami without clashing with flintrock's setup process?

I'm finding using flintrock great because it's so speedy, and I'm avoiding (almost) all extra cluster setup steps to keep it speedy by having everything else already setup on a custom ami.

nchammas commented 6 years ago

The easiest thing to do is use flintrock run-command to inject the configs into the appropriate files, though the commands will be clunky (you'll probably need to use sed or something similar).

My recommended solution is to not use credentials at all and instead use IAM roles.

A potential future solution is coming in #202, where you'll be able to bring your own templates and have Flintrock use them during launch.

DBCerigo commented 6 years ago

Ok, that's helpful thanks, and #202 looks great, hoping it gets merged!