natcap / pygeoprocessing

Geoprocessing operations for Python
https://pygeoprocessing.readthedocs.io/en/latest/
Other
80 stars 9 forks source link

Proposal: options for `zonal_statistics` rasterization algorithm #312

Open emlys opened 1 year ago

emlys commented 1 year ago

how it works now

zonal_statistics rasterizes aggregate polygons. GDAL provides two options:

zonal_statistics uses ALL_TOUCHED=False. Some aggregate polygons can end up with no pixels at all (if they don't happen to overlap any pixels' centerpoint). zonal_statistics handles this case like so:

for each polygon with no pixels:
    calculate the polygon's bounding box
    read in the raster data within that bounding box
    calculate stats based on that window of data

proposed changes

Add a kwarg all_touched=False to zonal_statistics. Pass this value to gdal.RasterizeLayer. Remove the special handling of unset polygons.

benefits

examples

image
dcdenu4 commented 1 year ago

James shared a link for the gdal rasterization source: https://github.com/OSGeo/gdal/blob/89e3fc244652771ca4f8abc0a6fe5794e2201b26/alg/gdalrasterize.cpp#L759

dcdenu4 commented 1 year ago

Dave shared some background from what QGIS is doing: https://github.com/qgis/QGIS/blob/d5626d92360efffb4b8085389c8d64072ef65833/src/analysis/vector/qgszonalstatistics.cpp#L266 Talked about in this SO post: https://gis.stackexchange.com/questions/276794/how-does-qgis-zonal-statistics-handle-partially-overlapping-pixels

davemfish commented 1 year ago

https://github.com/perrygeo/python-rasterstats/blob/master/src/rasterstats/main.py

emlys commented 1 year ago

We determined that this needs some more information to make a decision -

davemfish commented 1 year ago

Here's another zonal stats library to keep an eye on. I think python bindings are in the works. https://github.com/isciences/exactextract

Their readme also includes a comparison of other implementations.

davemfish commented 1 month ago

Dave shared some background from what QGIS is doing: https://github.com/qgis/QGIS/blob/d5626d92360efffb4b8085389c8d64072ef65833/src/analysis/vector/qgszonalstatistics.cpp#L266 Talked about in this SO post: https://gis.stackexchange.com/questions/276794/how-does-qgis-zonal-statistics-handle-partially-overlapping-pixels

Here's the interesting bit of how QGIS handles very small polygons relative to the grid size:

https://github.com/qgis/QGIS/blob/d5626d92360efffb4b8085389c8d64072ef65833/src/analysis/vector/qgszonalstatistics.cpp#L266

    if ( featureStats.count <= 1 )
    {
      //the cell resolution is probably larger than the polygon area. We switch to precise pixel - polygon intersection in this case
      statisticsFromPreciseIntersection( featureGeometry, offsetX, offsetY, nCellsX, nCellsY, cellsizeX, cellsizeY,
                                         rasterBBox, featureStats );
    }
dcdenu4 commented 1 month ago

In our discussion today I think the general take aways were:

phargogh commented 1 month ago

Also of interest for looking at different rasterization methods, this paper might be of interest: https://searchworks.stanford.edu/articles/edseee__edseee.9942350. I have not yet read this, so this might not end up being relevant at all.. This paper is for smooth surfaces, which isn't immediately applicable to our regular work.