xarray-contrib / xvec

Vector data cubes for Xarray
https://xvec.readthedocs.io
MIT License
93 stars 9 forks source link

add zonal_stats based on exactextract #62

Closed martinfleis closed 2 months ago

martinfleis commented 5 months ago

Dev version of exactextract is now on PyPI. We can try to wrap in our zonal_stats as another method alongside rasterio-based rasterize and iterate.

masawdah commented 3 months ago

Hi, I took a look to the available exactextract package on PyPI to explore the possibility of integrating it into zonal statistics. Before diving into implementation, I'd like to share some initial thoughts:

  1. Regarding datacubes backed by xarray, it's important to note that exactextract supports only DataArray, not Dataset.

  2. The order of dimensions is important (longitude, latitude).

  3. The most important thing is that exactextract supports 2D or 3D cubes. For higher dimensions, such as 4D or more, alternative handling methods are necessary. I proposed stacking the additional dimensions into a single dimension, such as (long, lat, time, level1, level2) to (long, lat, stack_dim), applying exactextract, then unstacking the result to the original dimensions, in this way we avoid iteration through the vartiables and dimensions.

martinfleis commented 3 months ago

Regarding datacubes backed by xarray, it's important to note that exactextract supports only DataArray, not Dataset.

Would that mean a loop over DataArrays within the Dataset if we wanted to do it all?

The order of dimensions is important (longitude, latitude).

We shall be able to check for that.

I proposed stacking the additional dimensions into a single dimension, [...] then unstacking the result to the original dimensions

That sounds reasonable.

Do you have any sense on performance compared to our existing methods?

Thanks for looking into that!

masawdah commented 3 months ago

Would that mean a loop over DataArrays within the Dataset if we wanted to do it all?

once convert the dataset into an xarray.DataArray would be enough

Do you have any sense on performance compared to our existing methods?

Not yet. I'll see if we can compare them using a high-dimensional datacube or a large spatio-temporal extent. I might use the last use case from openEO where we struggled with memory issues.