Open cdibble opened 4 years ago
Have yet to try this off S3 - will test it - looks like there is an issue with AWS credentials. Unable to load AWS credentials from any provider in the chain Have to track it down
Thanks for getting back to me. I am very interested in seeing that functionality- please let me know if there is anything else I can provide based on my experience. Cheers!
Thanks for your patience, still trying to debug this, can replicate the problem, trying to find a solution.
Tried it on MinIO using:
--conf spark.hadoop.fs.s3a.access.key="minioadmin" \ --conf spark.hadoop.fs.s3a.secret.key="minioadmin" \ --conf spark.hadoop.fs.s3a.endpoint=http://localhost:9000 \ --conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \ --conf spark.hadoop.fs.s3a.path.style.access=true \ --conf spark.hadoop.fs.s3a.connection.ssl.enabled=false \ --conf spark.jars.packages="${PACKAGES}" \
Hello-
I am trying to use your package to parse a bunch of .gdb databases that are in S3 (the same AIS ship data as in your example) and write them to a new S3 bucket using parquet so that I can process them later. It works fine with a local PySpark session, but fails when run on the cluster, giving the "Unable to load AWS credentials from any provider in the chain" error. My AWS credentials are available to all of the EC2 instances in the cluster (set as environment variables in bash, then imported to hadoop configuration settings in PySpark via script), though, and I can use the cluster to, say, read a GDB file as a text file.
I launch PySpark with:
And then I run the following code successfully:
But it fails when actually executing on the cluster the command:
Any ideas on why this is failing? Is support for boto3/s3 access missing? Any pointers would be appreciated. Thanks for the package!
Here's the full stack trace: