locationtech / geotrellis

GeoTrellis is a geographic data processing engine for high performance applications.
http://geotrellis.io
Other
1.33k stars 360 forks source link

RasterSourceProvider implementation needs to be more discriminating #3183

Open metasim opened 4 years ago

metasim commented 4 years ago

Not sure how to make this reproducible as there's a classloader ordering thing going on, but I was able to get a GeoTiffRasterSource from a .jp2 URI:

val rs = geotrellis.raster.RasterSource("s3://sentinel-s2-l2a/tiles/22/L/EP/2019/5/31/0/R60m/B08.jp2")
rs.getClass
...
> geotrellis.raster.geotiff.GeoTiffRasterSource

Further inspection of geotrellis.raster.geotiff.GeoTiffRasterSourceProvider shows that it does not check the file extension. Furthermore, GT should make the GDAL provider have higher priority over the others when GDAL is available.

pomadchin commented 4 years ago

@metasim I am not sure that GDAL should be a default behavior, since there are no obvious benefits in its usage; it only adds complexity in managing GDAL and application memory.

At this point it also depends on how classes were loaded in your particular application, we don't sort providers by priorities, but it is possible to do for sure.

It is also a question for the case when you have a file without extension and it is a TIFF but you still want to use GeoTiffRasterSource to read it to avoid GDAL usage.

metasim commented 4 years ago

@pomadchin Gotcha.. the preference comment was subjective. Also subjectively, we seem to get better performance with GDAL over JVM. (I know your measurements show otherwise)

pomadchin commented 4 years ago

@metasim probably for your use case it works; but it didn't really work well for very long running concurrent applications (cc @notthatbreezy here; we had an experiment with RasterFoundry, probably he has some more insights about it) and in heavy spark jobs (as you could notice it requires configuration of both spark jobs and GDAL parameters and it shows only a small gain in the performance).