Open dcherian opened 3 months ago
I believe this means that geometry should be values of the DataArray, not coordinates, right? Like stuff described in this paper (code). If you also want reflectance at the same time, you would need a Dataset with one DataArray of reflectance and the other of geometries, with coordinates being time and either ID of a geom or a GeometryIndex but that would need to reflect only one point in time.
While this use case is not directly supported by Xvec, it is certainly within scope. And given shapely is based on numpy.ufuncs there's nothing blocking us dumping geoms to values. We'll just need to build some CRS support and related methods to get the basic functionality. For more I'd need to understand the exact use case and its needs.
cc-ing @loreabad6 in this discussion as this is something she's been working on within the R space. I will speak with @loreabad6 about this in a few weeks to ensure the same support she built in R lands in Xvec but I am not entirely sure how much work it is.
Technically the indexes can be backed by nD variables, so in theory there's a possible extension. Of course @benbovy is the expert here.
Thanks @martinfleis for the ping, I also sent this at the webinar but this is the working repo for the current implementation in R: https://github.com/loreabad6/post I am currently working on details for the class for tabular formats (since post aims to exchange between nested data frames and array representations) but I hope that before the SDSL I can add a couple of examples on array operations with time-variant geometries as shown in the webinar today (excellent work by the way @e-marshall!)
@e-marshall will a recording be available? I didn't make it in the end due to family duties.
Technically the indexes can be backed by nD variables, so in theory there's a possible extension
Do you have an example? That sounds like a complicated case to wrap a head around.
Many here https://github.com/pydata/xarray/discussions/7041 (kd tree index, raster index)
I mentioned a few possible use cases of nD indexed coordinate variables with a GeometryIndex
some time ago here: https://discourse.pangeo.io/t/vector-data-cubes/2904/12. I'd be curious to see other examples!
I guess that one possible way to support it in xvec would be to:
pandas.Index
wrapped into a lightweight nD-compatible array adapter.GeometryIndex
.Implementing 1 and 2 would need a little bit of work around Xarray's PandasIndex
. Something like this might work:
GeometryIndex.from_variables()
should pass a flattened coordinate variable to PandasIndex.from_variables()
GeometryIndex.create_variables()
should encapsulate the pandas.Index
into a wrapper almost identical to PandasIndexingAdapter but that keeps track of the coordinate's original nD shape (note: we could support that in Xarray to avoid duplicating wrapper classes, if that makes sense). We also need to return an xarray.Variable
object here since xarray.IndexVariable
is still limited to the 1-dimensional case and needs to be refactored (and I think it is pretty safe now to use a regular Variable
for an indexed coordinate).Regarding 3, this is the approach used in Xoak (see here). Xoak doesn't use Xarray indexes yet but the logic remains the same and could probably be implemented in GeometryIndex
.
Note that I haven't looked into the other features provided by GeometryIndex
(e.g., query) so there may be complications in the n-dimensional case.
Also from what I understand the serialization of nD geometry variables using CF-conventions may be tricky, although this is not an issue specific to indexes.
@e-marshall will a recording be available? I didn't make it in the end due to family duties.
@martinfleis ~I believe it will be, but I'm not sure exactly when. I'll add a link to this thread when it is available!~ Here is the link
Inspiring talk @loreabad6 at the SDSL workshop today! It would be nice to have some of the Post features implemented here as well.
A tentative API:
dataset.xvec.summarise_geometry(
coord_name=(var_name, summary_dim, summary_func)
)
where summarise_geometry
is a convenient method that:
var_name
, which must be present in the dataset
, contain shapely geometries and have at least the summary_dim
dimensionsummary_func
function (or a pre-defined function if a string is passed) to that input variable, reducing all dimensions except summary_dim
GeometryIndex
in the output dataset(I assume reducing over a flexible number of dimensions is possible with shapely 2's universal functions?)
That could be useful to have (maybe alongside n-dimensional coordinate with a GeometryIndex if this proves to be possible)
I assume reducing over a flexible number of dimensions is possible with shapely 2's universal functions?
Generally yes, you should be able to specify dimension of an array along which the reduction happens but I don't think this is properly tested so we may hit some weirdness assuming 1d array.
One thing we should figure out is the CRS metadata for geometries as values in the array as that either needs to be linked to the GeometryIndex or solved differently.
We are seeing lots of questions about handling time-varying geometries:
Would be good to describe this use case in the docs as appropriate.