Open jdenisgiguere opened 4 years ago
The second line of the Gist you posted points out that the credentials are missing....
In your posted code I do see the credentials explicitly set, which explains how the parquet read with s3a://
scheme works.
Can you double check that the credentials are set in the initial attempt to read geotrellisCatalog, without the parquet?
@vpipkt , to reproduce the issue, the only thing I do is commenting the lines 71 and 72. If they are commented, I get the error message, if theyr are not, the spark.read.geotrellisCatalog
is working.
@metasim any reason why the geotrellis catalog reader would not honor the config options setting s3 credentials?
@vpipkt @jdenisgiguere I'm surprised it's not getting passed along, because under the covers it's actually pulling from native Spark datasources, which I'd expect to honor that. IOW, how would spark.read.json
be any different than spark.read.parquet
?
Actually, those calls are actually just parsing data that's been fetched by other operations. Probably, need to look at the GeoTrellis APIs to see if they pay attention to the Spark properties.
This may also be an issue with the use of different Hadoop library versions in the final application build.
Current situation
When I to read Geotrellis catalog with an
s3a://
URI usingspark.read.geotrellisCatalog
, I get the following error: https://gist.github.com/jdenisgiguere/61161a1bd9636ec91c3b75cbb6a845b9A workaround is to first read data with the
spark.read.parquet
method in the same bucket. After this call,spark.read.geotrellisCatalog
will be able to read data.See: https://github.com/jdenisgiguere/rasterframes-minio-ZazJXB4U/blob/acaa3b1de2372a642223ff9b48abba9d8e208dd5/read-with-rasterframes0.9/src/main/scala/io/anagraph/zazjxb4u/RfBisReader.scala#L69-L72
Expected situation
Prior invocation of
spark.read.parquet
should not be required to read a Geotrellis HadoopAttributeStore.