xarray-contrib / xdggs

Xarray extension for DGGS
https://xdggs.readthedocs.io
Apache License 2.0
80 stars 14 forks source link

Extract grid cells as shapely (or spherely) geometries #10

Closed benbovy closed 1 month ago

benbovy commented 1 year ago

We could add a Dataset.dggs.assign_boundaries() that would create and assign a new coordinate of shapely.Polygon objects (and/or later spherely.Polygon objects when it is ready) of grid cell boundaries from the cell ids indexed coordinate.

This would be a nice complement to Dataset.dggs.assign_latlon_coords() (that we could maybe rename to assign_centers_latlon() for consistency). We could also have a Dataset.dggs.assign_centers() that would create shapely.Point (or spherely.Point) objects.

This would be useful for interoperability with xvec, for plotting, etc.

benbovy commented 1 year ago

assign_centers or assign_centroids?

assign_boundaries or assign_polygons? ("boundaries" may rather mean only the boundaries, not the whole polygon entity)

benbovy commented 1 year ago

Example with H3: https://medium.com/@jesse.b.nestler/how-to-convert-h3-cell-boundaries-to-shapely-polygons-in-python-f7558add2f63 (not sure it uses shapely's 2.0 vectorized geometry creation functions, though).

benbovy commented 1 year ago

After thinking more about it, I would lean towards Dataset.dggs.extract_* methods that would return one or more DataArrays. This is more flexible/composable. For assigning it as new coordinates, we could just do:

ds.assign_coords(
    cell_center=ds.dggs.extract_cell_centroids(),
    cell_bounds=ds.dggs.extract_cell_boundaries(),
)
keewis commented 1 year ago

if we go with that API, you could even remove the extract_ prefix. The only question that would raise is if they should always recompute based on the cell ids, or whether it would make sense to return the appropriate variable if it exists on the object.

As for cell_centroids / cell_centers: do you know if there's a way to go from Point variables to separate lat / lon coordinates and back in xvec?

benbovy commented 1 year ago

you could even remove the extract_ prefix

Agreed.

The only question that would raise is if they should always recompute based on the cell ids, or whether it would make sense to return the appropriate variable if it exists on the object.

I think we should always recompute. It is better letting users figure out whether they need to call ds.dggs.cell_centroids() or get the coordinate with ds.cell_centroids or whatever the actual name of that coordinate is if it exists.

As for cell_centroids / cell_centers: do you know if there's a way to go from Point variables to separate lat / lon coordinates and back in xvec?

I haven't checked that, but I assume that most DGGS backends deal with lat/lon coordinates directly instead of shapely Point, so supporting only the latter would mean a wasteful creation of shapely objects + extraction of coordinates for those who do not need shapely.

keewis commented 1 year ago

oh, I was not suggesting to drop the existing computation of lat / lon values from cell ids, just wondering if given a point variable you'd be able to transform to lat / lon variables.

benbovy commented 1 year ago

Ah yes, not sure if there's a specific function in xvec, but using shapely.get_coordinates(...) might be enough?

keewis commented 1 year ago

I'd imagine we'd be able to extend xvec to support that.

keewis commented 1 year ago

how much should we care about the different terminologies? Like, we've been talking about "cells" and "cell ids", but the OGC working group tends to talk about "zones" and "zonal ids". We could in theory make one an alias (or a thin wrapper) of the other, but we could also just follow H3 and stick with "cells".

In any case, I think having a bunch of cell_* (or zone_*) properties (or methods, should we want to follow the "properties for quantities that are cheap to recompute, methods for more expensive ones") might make sense, for example: ds.dggs.cell_ids, ds.dggs.cell_boundaries, ds.dggs.cell_centers. The advantage of having attributes with a similar prefix would be mostly tab completion.

keewis commented 1 month ago

should have been closed by #30, which puts the polygons into shapely arrays. For using spherely I believe we should open a separate issue.