xarray-contrib / xvec

Vector data cubes for Xarray
https://xvec.readthedocs.io
MIT License
91 stars 9 forks source link

creating geometry variables from bounds #15

Open keewis opened 1 year ago

keewis commented 1 year ago

I wonder if it would be in scope for the xvec accessor to have a method that can be used to convert two bounds coordinates to shapely geometry columns?

I'd imagine something like this:

import xarray as xr
import cf_xarray
import xvec

ds = (
    xr.tutorial.open_dataset("air_temperature", chunks={})
    .cf.add_bounds(["lat", "lon"])
    .xvec.bounds_to_shapely(["lat_bounds", "lon_bounds"])
)

That would require xvec to broadcast or stack lat and lon, so we might also require doing that first (but ideally, we'd support 2D geometry variables):

ds = (
    xr.tutorial.open_dataset("air_temperature", chunks={})
    .cf.add_bounds(["lat", "lon"])
    .stack(cells=["lat", "lon"])
    .xvec.bounds_to_shapely(["lat_bounds", "lon_bounds"])
)
benbovy commented 1 year ago

Yes creating Xarray geometry variables from coordinate variables is perfectly in the scope of xvec IMO. I'd try to be as generic as possible though, i.e., keep close of shapely's geometry creation functions.

From my understanding of xarray-cf (and CF conventions), the bounds variables are already compatible with shapely's geometry creation API?

keewis commented 1 year ago

From my understanding of xarray-cf (and CF conventions), the bounds variables are already compatible with shapely's geometry creation API?

At the very least I couldn't figure out how to do the transformation when I posted the issue. Looking at the functions again, something like this might work:

def bounds_to_shapely(ds, coords):
    broadcast_coords, = xr.broadcast(ds[coords].reset_coords(coords))
    x = broadcast_coords[coords[0]]
    y = broadcast_coords[coords[1]]

    core_dims = [dim for dim in x.dims if dim != "bounds"]
    geometry = xr.apply_ufunc(
        shapely.box,
        x.isel(bounds=0),
        y.isel(bounds=0),
        x.isel(bounds=1),
        y.isel(bounds=1),
        input_core_dims=[core_dims, core_dims, core_dims, core_dims],
        output_core_dims=[core_dims],
    )
    return ds.assign_coords(geometry=geometry)

(
    xr.tutorial.open_dataset("air_temperature")
    .cf.add_bounds(["lat", "lon"])
    .pipe(bounds_to_shapely, ["lon_bounds", "lat_bounds"])
)

(though of course I don't have any experience with shapely so I might be missing a better way)

benbovy commented 1 year ago

Hmm, not sure but shapely.box is a convenient function for creating rectangle polygons? If so, it might be easier here to directly use shapely.polygons (possibly via shapely.linearrings) with (N, 2) arrays of coordinates and also providing indices?

keewis commented 1 year ago

From CF-conventions: cell boundaries, it seems that for (n, 2) bounds shapely.box should be sufficient since the cell can only be rectangular.

There's also the option of having (n, m, p) bounds variables which would allow non-rectangular cells, and for those I guess we'd actually need shapely.polygons.

Implementation-wise I think we could either normalize the bounds to (n, m, 4) and use shapely.polygons, or special-case (n, 2) to use shapely.box (not sure which is easier).

dcherian commented 1 year ago

Somewhat related see https://cf-xarray.readthedocs.io/en/latest/geometry.html

cc @aulemahal

dcherian commented 1 month ago

https://github.com/xarray-contrib/cf-xarray/pull/478 solves this I think and just needs some tests to be complete