Open jdenisgiguere opened 4 years ago
My only guess here is what version of GeoTrellis the catalog was created with?
Since the error is thrown in the geotrellis.spark.io.hadoop
package, that's where I would go looking for changes. It looks like the packages have been reorganized in GT 3.x series but I'm not familiar with back compatibility situation for catalogs and layers.
Thank you @vpipkt for your quick answer.
We use Geotrellis 2.3.3 which is the version required for rasterframes 0.8.4 according to project/RFDependenciesPlugin.scala
.
I would expect to see S3AttributeStore
instead of HadoopAttributeStore
for a URI with the prefix s3a://
.
Just a hunch here that maybe the geotrellis.spark.io.s3.S3LayerProvider
is not on the classpath? Or perhaps the META-INF/services/geotrellis.spark.io.AttbitueSotreProvider
is not listing geotrellis.spark.io.s3.S3LayerProvider
?
@jdenisgiguere do you happen to have a public version of s3a://geoimagery/geotrellis_geoimagery/
we could use to replicate the issue?
I create a git repo with data to reproduce this issue: https://github.com/jdenisgiguere/rasterframes-minio-ZazJXB4U
The repo also contains code to read the Geotrellis Layer with Geotrellis v2.3.3 and a non-working attempt to read the same data with rasterframes 0.8.5. I have an issue with the management of Hadoop versions in the latter.
Thanks in advance for your help.
I push a new commit in the proof of concept with rasterframes 0.8.5. This is my last stack trace. https://gist.github.com/jdenisgiguere/fe3d274d1baf2ba2730c920ff8abd128 .
@vpipkt , you gave me a precious hint 3 weeks ago, but I did not have enough background to understand it well. So, using the protocol s3a:://
, it is expected that the data is from a Hadoop Data Store. s3://
will use plain AWS Java SDK.
Geotrellis documentation provided explanation on how to configure a S3Provider to use minio, but I don't know how to this with rasterframes.
I could also modify my backend to save data in Geotrellis with a HadoopLayerWrite. Since we cannot use Minio as s3a storage source with the default hadoop version bundled with spark 2.4.4 (Hadoop v2.7), there are more to learn to be able to use pyrasterframes this way.
To use Geotrellis S3 backend with Minio, you cannot provide only the Layer URI. You also need to provide the s3Client. https://github.com/locationtech/geotrellis/blob/master/s3/src/main/scala/geotrellis/store/s3/S3AttributeStore.scala#L43
If I understand well, we cannot currently provide this parameter when we want to read a geotrellis layer or a geotrellis catalog with rasterframes. https://github.com/locationtech/rasterframes/blob/develop/datasource/src/main/scala/org/locationtech/rasterframes/datasource/geotrellis/GeoTrellisRelation.scala#L62-L68
@vpipkt, if you think this is appropriate, this could be tagged as enhancement
or close it since it is working as expected.
Current situation
I have a geotrellis catalog using the S3 backend. Catalog and data are stored on a minio server. I'm using Geotrellis v2.3.3
When I try to access the catalog with rasterframes v0.8.4, I get the following error messages:
Expected situation
I would expect to be able to read the catalog with this configuration.
Detailled environnement