Open snowman2 opened 3 years ago
Closing for #12
I am wondering if adding support for these R-Tree based selectors would be of interest
Yes this would be nice additions! We could add adapters for those R-Tree indexes either in xoak or in 3rd party packages. Adding it in xoak would be a reasonable option for now (dependencies are optional for all adapters implemented in xoak). I'm not sure how the number of adapters would grow in the future, though.
It would allow for selecting the points with a geometry as input.
This would be great too. That would require some work, though, as we need to extend the API of the index adapters and the xoak
xarray accessors in order to support this. Open questions, I guess, are which "conventions" to use for representing those geometries within the xarray data model, and what are the advantages of using xarray instead of, e.g., geopandas for this use case.
(I'm re-opening this issue as I think it's worth discussing geometry based selection here - the topic #12 is too broad)
I guess, are which "conventions" to use for representing those geometries within the xarray data model, and what are the advantages of using xarray instead of, e.g., geopandas for this use case.
This stems from: https://github.com/corteva/rioxarray/issues/202
Currently in rioxarray
you can clip the data using a list of geometries:
https://corteva.github.io/rioxarray/stable/examples/clip_geom.html
However, this only works for regular grids (i.e. 1 dimensional x and y coordinates that are equally spaced)
Coordinates:
* y (y) float64 ...
* x (x) float64 ...
The idea of this would be to enable clipping with 2 dimensional coordinates (usually latitude and longitude) with irregular grids:
* latitude (latitude, longitude) float64 ...
* longitude (latitude, longitude) float64 ...
With the pygeos.STRTree, you can query for the coordinates that intersect the input geometries and get the indexes back to generate a mask you could use to clip/select data.
>>> import pygeos
>>> tree = pygeos.STRtree(pygeos.creation,points(xds.longitude, xds.latitude))
>>> tree.query_bulk([pygeos.box(2, 2, 4, 4), pygeos.box(5, 5, 6, 6)]).tolist()
In this scenario, there isn't really a need/benefit to use geopandas.
I see, clipping irregular data with a list of geometries is a nice example that xoak
should support directly or indirectly.
xoak
currently exposes a .sel
method that is similar to xarray's .sel
. Currently only point-wise indexing (i.e., using xarray objects as indexers) is possible, but range or rectangular selection (using slices) like in the simple examples given in https://corteva.github.io/rioxarray/stable/examples/clip_geom.html should be supported later. xoak
's xarray accessor API could be extended with extra methods for more advanced but still common operations like clip
with compound or complex geometries (although it might be too specific to fall in xoak
's scope).
In addition, there's ongoing discussion/work in #32 to expose xarray-friendly wrappers for the query methods of the adapted index objects. The challenge here would be to support the variety of the queries supported by each index (e.g., pygeos.STRTree.query_bulk
, sklearn.neighbors.KDTree.query_radius
, etc.) through a common API.
BTW, I started writing bindings for the s2geometry library here that may provide similar capabilities than pygeos.STRTree
for spherical coordinates.
The challenge here would be to support the variety of the queries supported by each index (e.g., pygeos.STRTree.query_bulk, sklearn.neighbors.KDTree.query_radius, etc.) through a common API.
By common API, I mean a small set of query methods, one for each kind of query, rather than a single query method for all kinds of queries. Still, for the same kind of query there might be subtle differences from one index object to another.
I am wondering if adding support for these R-Tree based selectors would be of interest:
It would allow for selecting the points with a geometry as input.