Frac might return also the nodata value inside a polygon

isciences / exactextract

Fast and accurate raster zonal statistics

Apache License 2.0

258 stars 33 forks source link

Frac might return also the nodata value inside a polygon #100

Closed WCMC-vblanque closed 5 months ago

WCMC-vblanque commented 5 months ago

Some of our polygon can have half of the pixels with nodata values, so when we are calculating statistics it would be good to know how much of the polygon has been considered with valid pixel and how much has been ignored because nodata values.

After reading the documentation, I could no understand how the nodata values are treated by exactextract. Maybe it is already possible to get this nodata values into the frac.

Thanks for your great work. Happy to help to improve this great library.

dbaston commented 5 months ago

I don't think there's a way to get this right now. It might make sense to either add a new stat to return the count of pixels, including nodata, or add argument to the existing stats, e.g. count(include_nodata=true) or frac(include_nodata=true)

dbaston commented 5 months ago

I wonder if this could be more simply solved by #91 .

sam-bradshaw-wcmc commented 5 months ago

I liked your original suggestion of either including count of nodata pixels or adding arguments to existing stats.

Could you elaborate on how #91 gives you the proportion of nodata pixels within a given area of interest? What would happen if the raster had a non-zero nodata value and 0 was a valid pixel value?

dbaston commented 5 months ago

Could you elaborate on how https://github.com/isciences/exactextract/issues/91 gives you the proportion of nodata pixels within a given area of interest?

I'm thinking that an argument like frac(default_value=999) could solve the case of "give the fraction of all categories, including nodata" while also enabling usages like sum(default_value=0) ("sum the population, assuming nodata = 0").

I think the include_nodata argument I suggested above runs into a problem with stats like unique. What value do we use to represent the nodata pixels? For floating point types we can use NaN but for integer types we need the user to provide a placeholder value, which is essentially what default_value does.

sam-bradshaw-wcmc commented 5 months ago

I guess something like frac(default_value=999) would also work.

We're currently getting the frac and unique stats and zipping them together to create a dictionary where the keys of the dict are the pixel values, and the values of the dict are the fraction of cells with that pixel value. If we could supply a default/nodata value for frac (with this value then also appearing in the unique stat output), it would be very helpful to knowing what fraction of the pixels had no data.

dbaston commented 5 months ago

I guess something like frac(default_value=999) would also work.

Support for this has been committed, if you'd like to give it a try.

We're currently getting the frac and unique stats and zipping them together to create a dictionary where the keys of the dict are the pixel values

This can currently be done by the map_fields output option: https://github.com/isciences/exactextract/blob/bcd76993c7c54f4d498e6b3088e8517fefd99c6a/python/tests/test_exact_extract.py#L982-L1004 That said, I'm not really happy with the syntax and find it doesn't work well for multi-band inputs. Ideas for improvements would be welcome (or maybe it's best left to application code, as you're doing now)

sam-bradshaw-wcmc commented 5 months ago

thanks very much for adding support for this - can't wait to try it out

sam-bradshaw-wcmc commented 5 months ago

the new stats arguments work great - thanks again

WCMC-vblanque commented 5 months ago

Thanks a lot @dbaston for solving this ticket quickly. It will bery useful to improve the calculation of some metrics in our biodiversity platform 💚