Closed DBCerigo closed 6 years ago
Spark doesn't look at ~/.aws/credentials
and probably not at the AWS_*
environment variables either.
What you need is to use the fs.s3*
properties described here, and if you're using Hadoop 2.7+ you should specifically use s3a
as opposed to s3n
or just s3
. I think the settings go in hadoop/conf/core-site.xml
.
Better than managing credentials, though, would be if you used IAM roles as described in the README. Then you don't need to specify secrets anywhere.
In the end I guessed it must not be - though this in combination with
com.amazonaws.AmazonClientException: Unable to load AWS credentials from any provider in the chain
at com.amazonaws.auth.AWSCredentialsProviderChain.getCredentials(AWSCredentialsProviderChain.java:131)
seemed to imply that it should be looking in there which keep me trying for longer - maybe it's not because it's not DefaultAWSCredentialsProviderChain
? Anyway!
Do you have a suggestion how to either automatically inject those xml config files, or to "bake" them into the ami without clashing with flintrock's setup process?
I'm finding using flintrock great because it's so speedy, and I'm avoiding (almost) all extra cluster setup steps to keep it speedy by having everything else already setup on a custom ami.
The easiest thing to do is use flintrock run-command
to inject the configs into the appropriate files, though the commands will be clunky (you'll probably need to use sed
or something similar).
My recommended solution is to not use credentials at all and instead use IAM roles.
A potential future solution is coming in #202, where you'll be able to bring your own templates and have Flintrock use them during launch.
Ok, that's helpful thanks, and #202 looks great, hoping it gets merged!
Hi,
This may well be a spark issue which doesn't concern flintrock, but it is something that becomes a bit more tricky in flintrock.
I'm trying to use the ID and secret key of an IAM user to authentic pulling data into a cluster from s3. I've confirmed that my ID and secret work when used like
sqlCtx.read.csv('s3a://<ID>:<SECRET>@bucket/file.txt)
.But I get the
Unable to load AWS credentials from any provider in the chain
if trying to load the ID and secret dynamically. Ways I've tried getting them to load, all resulting in same mentioned error; • adding~/.aws/credentials
(and confirming it's found by aws cli usingaws configure
) to master and slaves • running!export AWS_ACCESS_KEY_ID=...
, and for secret, in Jupyter notebook that is connected to the spark clusterSuggestions on how you solved this appreciated. Note: I want to rely solely on the ID and secret key as working in a team and want to easily share bucket access.
Thanks