spark-redshift-community / spark-redshift

Performant Redshift data source for Apache Spark
Apache License 2.0
135 stars 62 forks source link

ERROR: Problem reading manifest file - S3ServiceException:The provided token has expired #151

Open AlixMetivier opened 7 months ago

AlixMetivier commented 7 months ago

Hello, sorry if this is a basic configuration issue, I4ve been trying different things and can't have it work. this is pretty straight forward, I just want to write some data to redshift, I'd like to use basic authentification for S3 as well as redshift. I'd like to not configure the assume role.

the code is like this : ss.sparkContext().hadoopConfiguration().set("fs.s3a.access.key", "KEY"); ss.sparkContext().hadoopConfiguration().set("fs.s3a.secret.key", "KEY"); ss.createDataFrame(rdd_row1out, row1Struct.class).write() .format("io.github.spark_redshift_community.spark.redshift") .option("url", "jdbc:redshift://" + "redacted" + ":" + "5439" + "/" + "dev" + "?user=" + "user" + "&password=password") .option("dbtable", "table1") .option("tempdir", "s3a://" + "bucket" + "/tmp") .option("forward_spark_s3_credentials", true).option("diststyle", "EVEN") .option("usestagingtable", "true") .mode(org.apache.spark.sql.SaveMode.Overwrite).save();

and I get the following error :

Caused by: com.amazon.redshift.util.RedshiftException: ERROR: Problem reading manifest file - S3ServiceException:The provided token has expired.,Status 400,Error ExpiredToken,Rid YEWEKN4Z60AG1YB4,ExtRid Ra3SGBowbXiER2JqazOgU7QgS1pI00gRECHL3Gd+8rkmLO/LiNOZCNIfm1186IAg0CdbYUMV9gU=,CanRetry 1 Détail :

error: Problem reading manifest file - S3ServiceException:The provided token has expired.,Status 400,Error ExpiredToken

is this an issue or is there something wrong

bsharifi commented 7 months ago

@AlixMetivier It looks like the token expired. Have you tried refreshing it?

AlixMetivier commented 7 months ago

@AlixMetivier It looks like the token expired. Have you tried refreshing it?

hi @bsharifi, I don't pass any token so I'm not sure what you're talking about ? Sorry if this is a trivial question ... I'm assuming that it should work with the creds from the redshift cluster and the creds to the bucket ?

bsharifi commented 7 months ago

Hi @AlixMetivier, do you have any other token-based credentials defined elsewhere that could be interfering (e.g., env vars, ~/.aws/credentials, etc.)?

AlixMetivier commented 7 months ago

hello @bsharifi, you were right, there was some issue with ~/.aws/credentials, I added there the access and secret key and it is now working. Something I don't understand although is that if I remove the credentials, the one applied to the spark conf are not enough to have it work, it will say that it could not load the proper credentials, is that expected ? I thought configuring spark only would work ?

the log I get when removing the credentials from ~/.aws/credentials

[ERROR] 10:34:55 io.github.spark_redshift_community.spark.redshift.RedshiftWriter- Exception thrown during Redshift load; will roll back transaction com.amazonaws.SdkClientException: Unable to load AWS credentials from any provider in the chain: [EnvironmentVariableCredentialsProvider: Unable to load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY)), SystemPropertiesCredentialsProvider: Unable to load AWS credentials from Java system properties (aws.accessKeyId and aws.secretKey), WebIdentityTokenCredentialsProvider: You must specify a value for roleArn and roleSessionName, com.amazonaws.auth.profile.ProfileCredentialsProvider@1e9afe4e: Unable to load credentials into profile [default]: AWS Access Key ID is not specified., com.amazonaws.auth.EC2ContainerCredentialsProviderWrapper@1f67d37f: Failed to connect to service endpoint: ]

bsharifi commented 6 months ago

@AlixMetivier This might be a limitation of where you are running the Spark job. Where are you running Spark (EMR, Glue, local)?

AlixMetivier commented 6 months ago

hi @bsharifi, this is running in spark local on a windows machine, should we consider this as a limitation then ?

sanulich commented 4 months ago

Hello @bsharifi, Do you have any updates on this issue?