Closed tomasz-dudziak closed 3 years ago
Connection to AWS is not handled by Parquet4S. Parquet itself relies on hadoop-client for connectivity. Link to the documentation of hadoop-aws is in the README: https://github.com/mjakubowski84/parquet4s#aws-s3.
You're right! Sorted by passing to options hadoopConf with the below setting:
hadoopConf.set("fs.s3a.path.style.access", "true")
I am trying to write Parquet to S3 (FlashBlade) using this library, but I am getting an UnknownHostException where it appears to try to connect to my-bucket.my-company-s3-endpoint.com. This is surprising as I would expect it to rather be my-s3-endpoint.com/my-bucket. What is wrong and can the library be configured to use the correct URL? Or could it be because of some other dependencies in my project?