Closed martinfleis closed 2 months ago
Hi, I took a look to the available exactextract package on PyPI to explore the possibility of integrating it into zonal statistics. Before diving into implementation, I'd like to share some initial thoughts:
Regarding datacubes backed by xarray, it's important to note that exactextract supports only DataArray, not Dataset.
The order of dimensions is important (longitude, latitude).
The most important thing is that exactextract supports 2D or 3D cubes. For higher dimensions, such as 4D or more, alternative handling methods are necessary. I proposed stacking the additional dimensions into a single dimension, such as (long, lat, time, level1, level2) to (long, lat, stack_dim), applying exactextract, then unstacking the result to the original dimensions, in this way we avoid iteration through the vartiables and dimensions.
Regarding datacubes backed by xarray, it's important to note that exactextract supports only DataArray, not Dataset.
Would that mean a loop over DataArrays within the Dataset if we wanted to do it all?
The order of dimensions is important (longitude, latitude).
We shall be able to check for that.
I proposed stacking the additional dimensions into a single dimension, [...] then unstacking the result to the original dimensions
That sounds reasonable.
Do you have any sense on performance compared to our existing methods?
Thanks for looking into that!
Would that mean a loop over DataArrays within the Dataset if we wanted to do it all?
once convert the dataset into an xarray.DataArray would be enough
Do you have any sense on performance compared to our existing methods?
Not yet. I'll see if we can compare them using a high-dimensional datacube or a large spatio-temporal extent. I might use the last use case from openEO where we struggled with memory issues.
Dev version of
exactextract
is now on PyPI. We can try to wrap in ourzonal_stats
as another method alongside rasterio-based rasterize and iterate.