locationtech-labs / geopyspark

GeoTrellis for PySpark
Other
179 stars 59 forks source link

support min/max in aggregate_by_cell with different nodata handling #644

Closed jdries closed 6 years ago

jdries commented 6 years ago

The current nodata handling for min and max in aggregate_by_cell is to return nodata if either of the input values is nodata. My current use case, is to compute a composite that retains the max NDVI value, on a dataset with quite a lot of nodata due to clouds. This doesn't work with the current operators, as I always end up with empty images. Hence it would make sense to also have a min/max operation where combine(data,nodata) returns data.

jbouffard commented 6 years ago

@jdries That behavior you're seeing is actually one of the core rules of GeoTrellis where any operation that takes a value and a ND will always return ND (with a few exceptions). There are ways you can work around this, though. Assuming you're working with a singleband layer, you can perform a focal MAX operation on the layer and then do a merge. That'll replace any ND of one Tile with the value of another Tile with the same key. Another way that's a bit more roundabout would be to change the ND to something else and then perform the aggregate_by_cell method.

If you end up trying one of those methods, please let you me know how it goes. If you continue to run into trouble, then I'd be happy help you work through it!

jdries commented 6 years ago

Thanks for the explanation, I also think I found a potential workaround through Python, so will close this issue!