Open AlixMetivier opened 10 months ago
@AlixMetivier It looks like the token expired. Have you tried refreshing it?
@AlixMetivier It looks like the token expired. Have you tried refreshing it?
hi @bsharifi, I don't pass any token so I'm not sure what you're talking about ? Sorry if this is a trivial question ... I'm assuming that it should work with the creds from the redshift cluster and the creds to the bucket ?
Hi @AlixMetivier, do you have any other token-based credentials defined elsewhere that could be interfering (e.g., env vars, ~/.aws/credentials, etc.)?
hello @bsharifi, you were right, there was some issue with ~/.aws/credentials, I added there the access and secret key and it is now working. Something I don't understand although is that if I remove the credentials, the one applied to the spark conf are not enough to have it work, it will say that it could not load the proper credentials, is that expected ? I thought configuring spark only would work ?
the log I get when removing the credentials from ~/.aws/credentials
[ERROR] 10:34:55 io.github.spark_redshift_community.spark.redshift.RedshiftWriter- Exception thrown during Redshift load; will roll back transaction com.amazonaws.SdkClientException: Unable to load AWS credentials from any provider in the chain: [EnvironmentVariableCredentialsProvider: Unable to load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY)), SystemPropertiesCredentialsProvider: Unable to load AWS credentials from Java system properties (aws.accessKeyId and aws.secretKey), WebIdentityTokenCredentialsProvider: You must specify a value for roleArn and roleSessionName, com.amazonaws.auth.profile.ProfileCredentialsProvider@1e9afe4e: Unable to load credentials into profile [default]: AWS Access Key ID is not specified., com.amazonaws.auth.EC2ContainerCredentialsProviderWrapper@1f67d37f: Failed to connect to service endpoint: ]
@AlixMetivier This might be a limitation of where you are running the Spark job. Where are you running Spark (EMR, Glue, local)?
hi @bsharifi, this is running in spark local on a windows machine, should we consider this as a limitation then ?
Hello @bsharifi, Do you have any updates on this issue?
Hello, sorry if this is a basic configuration issue, I4ve been trying different things and can't have it work. this is pretty straight forward, I just want to write some data to redshift, I'd like to use basic authentification for S3 as well as redshift. I'd like to not configure the assume role.
the code is like this :
ss.sparkContext().hadoopConfiguration().set("fs.s3a.access.key", "KEY"); ss.sparkContext().hadoopConfiguration().set("fs.s3a.secret.key", "KEY"); ss.createDataFrame(rdd_row1out, row1Struct.class).write() .format("io.github.spark_redshift_community.spark.redshift") .option("url", "jdbc:redshift://" + "redacted" + ":" + "5439" + "/" + "dev" + "?user=" + "user" + "&password=password") .option("dbtable", "table1") .option("tempdir", "s3a://" + "bucket" + "/tmp") .option("forward_spark_s3_credentials", true).option("diststyle", "EVEN") .option("usestagingtable", "true") .mode(org.apache.spark.sql.SaveMode.Overwrite).save();
and I get the following error :
is this an issue or is there something wrong