perrygeo / python-rasterstats

Summary statistics of geospatial raster datasets based on vector geometries.
BSD 3-Clause "New" or "Revised" License
518 stars 115 forks source link

Rasterization pixel overlap strategy #90

Open perrygeo opened 8 years ago

perrygeo commented 8 years ago

In addition to all_touched vs centroid, there's the possibility of using a partial pixel coverage to e.g. select cells with > X% overlap.

See the discussion at mapbox/rasterio#232 and a proof-of-concept implementation

The core functionality of calculating pixel weights probably belongs in GDAL or rasterio but could also live here I suppose. This lib would just use the result to threshold pixels above a specified percentage.

jhamman commented 8 years ago

+1 on this.

sgoodm commented 8 years ago

would be interested in seeing a performance comparison of your proof-of-concept with a similar implementation in R's raster package (see extract function with weights option. note: R implementation uses the weights to adjust values when calculating mean, not as a selection mechanism)

sgoodm commented 8 years ago

i went ahead and used your proof-of-concept to replicate the functionality of the R package. initial testing produced identical results and was about 7.5x faster than R.

https://github.com/sgoodm/python-rasterstats/tree/cell-weights

perrygeo commented 8 years ago

@sgoodm Very interesting. I don't have time to take a detailed look this week but I hope to check this out soon. Thanks for moving this forward!

max-nova commented 8 years ago

+1 I've been hacking my own solution for this and a bit surprised that there doesn't seem to be a good standard solution for computing pixel weights efficiently.

sgoodm commented 7 years ago

Improved original method for calculating percent coverage of cells. Instead of using cell bounding box intersections, it now rasterizes features at a finer resolution than raster source (this scale can be provided as optional input arg) then aggregates back to raster resolution to estimate percent coverage. Quite a bit faster at reasonable scales (default scale value is 10, or 1 order of magnitude finer resolution than raster).

You can use this to get weighted stats by specifying them as additional values in the stats arg (eg, "weighted_mean" - weighted variants exist for mean, count and sum). One caveat of this is when a weighted stat is detected, all_touched=True must be automatically added (warning given if all_touched is not given as true when providing weighted stat). Note: I'm not sure having to specify these as separate stats is the best way to do this for users

In addition to using this for weighted stats, it should be fairly easy to adapt this into another option for cell selection based on percent coverage (original reason for this issue). Would just need to create a second mask using an input val.

https://github.com/sgoodm/python-rasterstats/tree/cell-weights

sgoodm commented 7 years ago

PR #136 for latest on this