locationtech / rasterframes

Geospatial Raster support for Spark DataFrames
http://rasterframes.io
Apache License 2.0
243 stars 46 forks source link

masking functions should warn or convert if raw cell type #409

Closed vpipkt closed 4 years ago

vpipkt commented 4 years ago

Masking seems "broken" when the data tile that the mask is applied to has no nodata defined (raw celltype)

Queries about nodata cell counts are inconsistent with tile contents of the mask .

The masking functions could:

  1. throw assertion error, fail outright if the celltype of the data tile has no nodata;
  2. warn the user in logs but continue with the current logic;
  3. implicitly convert the raw type to the default nodata value; OR
  4. rely on a user config to decide among above options

Failing has the downside that it may not be evaluated until other parts of the spark job graph have been worked. REsulting in wasted computation and frustration.

Warnings may not be seen or noticed by the user ... esp true for users in the python api

Converting the celltype may mark valid data as nodata without the user understanding quite why, impacting the result of analytics.

Even with a user config there is the issue of setting a reasonable default.

vpipkt commented 4 years ago

I am going to implement option 1 above. It seems severe but we have internally wasted a lot of time with getting masking "right" for this reason.

The potential down-side in a realistic workflow seems small. There will probably be a decent amount of iteration in any analysis at the masking stage to make sure things are correct. The assertion will surface those errors at that point, presumably in small collect actions.

Some reassurance can be had against the downside of losing lots of time. If the assertion would be thrown on a lazy representation of the tile before tile contents are fetched, which seems reasonable.