Open perrygeo opened 8 years ago
+1 on this.
would be interested in seeing a performance comparison of your proof-of-concept with a similar implementation in R's raster package (see extract function with weights option. note: R implementation uses the weights to adjust values when calculating mean, not as a selection mechanism)
i went ahead and used your proof-of-concept to replicate the functionality of the R package. initial testing produced identical results and was about 7.5x faster than R.
https://github.com/sgoodm/python-rasterstats/tree/cell-weights
@sgoodm Very interesting. I don't have time to take a detailed look this week but I hope to check this out soon. Thanks for moving this forward!
+1 I've been hacking my own solution for this and a bit surprised that there doesn't seem to be a good standard solution for computing pixel weights efficiently.
Improved original method for calculating percent coverage of cells. Instead of using cell bounding box intersections, it now rasterizes features at a finer resolution than raster source (this scale can be provided as optional input arg) then aggregates back to raster resolution to estimate percent coverage. Quite a bit faster at reasonable scales (default scale value is 10, or 1 order of magnitude finer resolution than raster).
You can use this to get weighted stats by specifying them as additional values in the stats arg (eg, "weighted_mean" - weighted variants exist for mean, count and sum). One caveat of this is when a weighted stat is detected, all_touched=True must be automatically added (warning given if all_touched is not given as true when providing weighted stat). Note: I'm not sure having to specify these as separate stats is the best way to do this for users
In addition to using this for weighted stats, it should be fairly easy to adapt this into another option for cell selection based on percent coverage (original reason for this issue). Would just need to create a second mask using an input val.
https://github.com/sgoodm/python-rasterstats/tree/cell-weights
PR #136 for latest on this
In addition to all_touched vs centroid, there's the possibility of using a partial pixel coverage to e.g. select cells with > X% overlap.
See the discussion at mapbox/rasterio#232 and a proof-of-concept implementation
The core functionality of calculating pixel weights probably belongs in GDAL or rasterio but could also live here I suppose. This lib would just use the result to threshold pixels above a specified percentage.