locationtech / rasterframes

Geospatial Raster support for Spark DataFrames
http://rasterframes.io
Apache License 2.0
243 stars 46 forks source link

nodata handling on ProjectedRaster toRF #31

Open metasim opened 6 years ago

metasim commented 6 years ago

From @vpipkt on March 7, 2018 18:56

Consider the following

$ gdalinfo core/src/test/resources/L8-B4-Elkton-VA.tiff  -mm
Driver: GTiff/GeoTIFF
Files: core/src/test/resources/L8-B4-Elkton-VA.tiff
Size is 186, 169
....
Band 1 Block=186x22 Type=UInt16, ColorInterp=Gray
    Computed Min/Max=6396.000,27835.000

Then in spark repl:

scala> val r = SinglebandGeoTiff("L8-B4-Elkton-VA.tiff").projectedRaster
r: geotrellis.raster.ProjectedRaster[geotrellis.raster.Tile] = ProjectedRaster(Raster(geotrellis.raster.UShortRawArrayTile@36343207,Extent(703986.502389, 4249551.61978, 709549.093643, 4254601.8671)),EPSG:32617)
scala> r.tile.size
res0: Int =31434
scala> r.toRF(20,20).agg(aggDataCells($"tile"), min(tileMin($"tile"))).show
+--------------------+---------------------------------+
|agg_data_cells(tile)|min(UDF(tile) AS `tileMin(tile)`)|
+--------------------+---------------------------------+
|               36000|                              0.0|
+--------------------+---------------------------------+

Problem: NoData cells implied on tiles at edge of the raster are treated as zeros not as NoData. gdalinfo and GeoTrellis info leads us to expect less datacells than are present in the tile column. The min of the tile column is 0 instead of 6396 we expect from gdalinfo.

The source GeoTiff does not have any NoData defined, but to put it into an arbitrary layout we may need to define a NoData.

This may be a hard problem to solve well, as would involve selection of a safe NoData value for the entire tile column.

Copied from original issue: s22s/raster-frames#60

metasim commented 6 years ago

From @vpipkt on March 7, 2018 20:31

And I should say may also need to change CellTypes that are NoNoData to NoData....