locationtech / rasterframes

Geospatial Raster support for Spark DataFrames
http://rasterframes.io
Apache License 2.0
240 stars 46 forks source link

DataFrame `toLayer` extension throws NPE with TileExploder and rf_assemble_tile #533

Open vpipkt opened 3 years ago

vpipkt commented 3 years ago

As originally reported on gitter and then discussed a bit more on stack overflow. There seems to be something generating a NullPointerException deep in the GT internals in the following situation:

  1. transform a RasterFrameLayer in an ML pipeline with TileExploder, then
  2. aggregate with rf_assemble_tile
  3. .toLayer(tlm)
  4. .toRaster(...) <-- Throws NPE

Messing around some with the RasterFrameLayer at 3, I experimented with the toTileLayerRDD and was able to find that the RDD count exceeded the RasterFrameLayer count. I suspect toRaster and other methods are assuming the RasterFrameLayer is "complete", that it contains all SpatialKeys. But that toLayer does not provide that guarantee.

Note in addition to the work around in the SO answer, I was able to verify that the following case did not cause the bug: RasterFrameLayer.toDF().toLayer(tlm).toRaster(...). So something about the TileExploder and rf_assemble_tile may be afoot. I suspect this is all possible because there are entire Tiles in the layout that have entirely nodata, so no records are present on the dataframe at step 2.

image