perrygeo / python-rasterstats

Summary statistics of geospatial raster datasets based on vector geometries.
BSD 3-Clause "New" or "Revised" License
531 stars 116 forks source link

Multiband support #73

Open perrygeo opened 9 years ago

perrygeo commented 9 years ago

zonalstats was originally optimized for large vector feature collections against a single raster band (i.e. thousands of polygons against a DEM raster)

There is another use case which zonalstats doesn't handle so well: many rasters/bands against a single geometry (i.e. the value at a single point for thousands of climate model simulations or a multispectral image). You can loop through and call zonal stats on each raster/band but this incurs the overhead of the rasterio.open each time. A more efficient mechanism might be to stack all the data into a multi-band raster and open it once, slice the 3D array and don't apply the aggregate functions to the 3rd dim - return an array rather than a scalar statistic.

I'll have to do some research to see how much refactoring/code clutter this would cause. Otherwise, it might just be a new zonalstats_multiband function.

perrygeo commented 9 years ago

Work started in multiband branch

Still need:

perrygeo commented 9 years ago

Bummer

array.mean(axis=(-1, -2))  # Works; the mean of the last two axes
masked.mean(axis=(-1, -2))  # TypeError: tuple indices must be integers, not tuple
perrygeo commented 9 years ago

There is an open PR (https://github.com/numpy/numpy/pull/5706) that serves as a proof of concept fix. This would need to be applied to all the functions used by the stats:

min, max, mean, count, std, sum, median, unqiue, percentile

Maybe looping through the bands is the only viable approach for now - even if we could make those changes and get them merged, it will be a while before they make it into mainstream releases.

jonschwenk commented 2 years ago

+1 for reopening this one.

dvolgyes commented 2 years ago

For my use case it would be crucial to have some multiband feature. For instance, I would like to perform stand based statistics on vector values, e.g. pixel brightness (combination of RGB values), pixel hue, etc. These cannot be independently calculated by bands and combined later, the transformation must be before aggregation.

(Returning vector values would be also nice, but this could be worked around, calculating vector, and return selected vector component.)