xarray-contrib / xvec

Vector data cubes for Xarray
https://xvec.readthedocs.io
MIT License
101 stars 9 forks source link

time-varying geometries #78

Open dcherian opened 2 months ago

dcherian commented 2 months ago

We are seeing lots of questions about handling time-varying geometries:

Would be good to describe this use case in the docs as appropriate.

martinfleis commented 2 months ago

I believe this means that geometry should be values of the DataArray, not coordinates, right? Like stuff described in this paper (code). If you also want reflectance at the same time, you would need a Dataset with one DataArray of reflectance and the other of geometries, with coordinates being time and either ID of a geom or a GeometryIndex but that would need to reflect only one point in time.

While this use case is not directly supported by Xvec, it is certainly within scope. And given shapely is based on numpy.ufuncs there's nothing blocking us dumping geoms to values. We'll just need to build some CRS support and related methods to get the basic functionality. For more I'd need to understand the exact use case and its needs.

cc-ing @loreabad6 in this discussion as this is something she's been working on within the R space. I will speak with @loreabad6 about this in a few weeks to ensure the same support she built in R lands in Xvec but I am not entirely sure how much work it is.

dcherian commented 2 months ago

Technically the indexes can be backed by nD variables, so in theory there's a possible extension. Of course @benbovy is the expert here.

loreabad6 commented 2 months ago

Thanks @martinfleis for the ping, I also sent this at the webinar but this is the working repo for the current implementation in R: https://github.com/loreabad6/post I am currently working on details for the class for tabular formats (since post aims to exchange between nested data frames and array representations) but I hope that before the SDSL I can add a couple of examples on array operations with time-variant geometries as shown in the webinar today (excellent work by the way @e-marshall!)

martinfleis commented 2 months ago

@e-marshall will a recording be available? I didn't make it in the end due to family duties.

martinfleis commented 2 months ago

Technically the indexes can be backed by nD variables, so in theory there's a possible extension

Do you have an example? That sounds like a complicated case to wrap a head around.

dcherian commented 2 months ago

Many here https://github.com/pydata/xarray/discussions/7041 (kd tree index, raster index)

benbovy commented 2 months ago

I mentioned a few possible use cases of nD indexed coordinate variables with a GeometryIndex some time ago here: https://discourse.pangeo.io/t/vector-data-cubes/2904/12. I'd be curious to see other examples!

I guess that one possible way to support it in xvec would be to:

  1. keep storing the indexed geometries as a 1-dimensional pandas.Index wrapped into a lightweight nD-compatible array adapter.
  2. keep track of the original nD shape of the coordinate variable in GeometryIndex.
  3. for data selection, one could use numpy.unravel_index to compute the indices along each dimension of the nD coordinate from the selected flat indices.

Implementing 1 and 2 would need a little bit of work around Xarray's PandasIndex. Something like this might work:

Regarding 3, this is the approach used in Xoak (see here). Xoak doesn't use Xarray indexes yet but the logic remains the same and could probably be implemented in GeometryIndex.

Note that I haven't looked into the other features provided by GeometryIndex (e.g., query) so there may be complications in the n-dimensional case.

Also from what I understand the serialization of nD geometry variables using CF-conventions may be tricky, although this is not an issue specific to indexes.

e-marshall commented 2 months ago

@e-marshall will a recording be available? I didn't make it in the end due to family duties.

@martinfleis ~I believe it will be, but I'm not sure exactly when. I'll add a link to this thread when it is available!~ Here is the link

benbovy commented 1 month ago

Inspiring talk @loreabad6 at the SDSL workshop today! It would be nice to have some of the Post features implemented here as well.

A tentative API:

dataset.xvec.summarise_geometry(
    coord_name=(var_name, summary_dim, summary_func)
)

where summarise_geometry is a convenient method that:

  1. takes an input n-dimensional variable var_name, which must be present in the dataset, contain shapely geometries and have at least the summary_dim dimension
  2. applies the summary_func function (or a pre-defined function if a string is passed) to that input variable, reducing all dimensions except summary_dim
  3. assign the result as a new 1-d coordinate with a GeometryIndex in the output dataset

(I assume reducing over a flexible number of dimensions is possible with shapely 2's universal functions?)

That could be useful to have (maybe alongside n-dimensional coordinate with a GeometryIndex if this proves to be possible)

martinfleis commented 1 month ago

I assume reducing over a flexible number of dimensions is possible with shapely 2's universal functions?

Generally yes, you should be able to specify dimension of an array along which the reduction happens but I don't think this is properly tested so we may hit some weirdness assuming 1d array.

One thing we should figure out is the CRS metadata for geometries as values in the array as that either needs to be linked to the GeometryIndex or solved differently.