spark-redshift-community / spark-redshift

Performant Redshift data source for Apache Spark
Apache License 2.0
135 stars 62 forks source link

com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden #113

Closed fernandocfbf closed 1 year ago

fernandocfbf commented 1 year ago

Hello everyone.

I'm using the spark-redshift-community connector to load and write data to my redshift database. But I'm getting status 403 for the S3 Bucket. The most curious thing is that I can see the temp path being created at my S3 Bucket, but for some reason I cannot load or write data using it. My code looks like the following:

import findspark
findspark.add_packages("io.github.spark-redshift-community:spark-redshift_2.12:5.1.0")
findspark.init()
from pyspark.sql import SparkSession

spark = SparkSession.builder.master(f"local[{self.cores}]").appName(self.app_name).getOrCreate()
        spark.sparkContext.setLogLevel("ERROR")
        spark._jsc.hadoopConfiguration().set("fs.s3a.access.key", S3_ACCESS_KEY)
        spark._jsc.hadoopConfiguration().set("fs.s3a.secret.key", S3_SECRET_KEY)
        spark._jsc.hadoopConfiguration().set("fs.s3a.impl","org.apache.hadoop.fs.s3a.S3AFileSystem")
        spark._jsc.hadoopConfiguration().set("fs.s3a.aws.credentials.provider","org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider")

df.write.format("io.github.spark_redshift_community.spark.redshift").options(
            url=f"jdbc:redshift://{HOST_REDSHIFT}:{PORT_REDSHIFT}/{DATABASE_REDSHIFT}",   
            user=USER_REDSHIFT,
            password=PASSWORD_REDSHIFT,
            tempdir= f"s3a://{S3_BUCKET}/test_folder",
            dbtable=table,
            forward_spark_s3_credentials="true",
            batchsize="100000")

The error: s3a://bucket/test_folder/f52da905-f68d-4963-b4b8-30e4021fcf14/0000_part_00: getFileStatus on s3a://bucket/test_folder/f52da905-f68d-4963-b4b8-30e4021fcf14/0000_part_00: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: W5HRHKX2TN0W35P9; S3 Extended Request ID: wmNZQg1ca/DeAPh42vt99MuQwGHw3FSh4KXD8SuUNlfk7iRn32/EvC0tB7J22UY+9YZIS2IoBiM=), S3 Extended Request ID: wmNZQg1ca/DeAPh42vt99MuQwGHw3FSh4KXD8SuUNlfk7iRn32/EvC0tB7J22UY+9YZIS2IoBiM=:403 Forbidden\r

Please help! Thank you in advance!

fernandocfbf commented 1 year ago

Got the issue. The problem was that my computer was 3h early. If someone is stucked at this problem, please have a look here: https://hadoop.apache.org/docs/r2.8.0/hadoop-aws/tools/hadoop-aws/index.html#Troubleshooting_S3A