xarray-contrib / xdggs

Xarray extension for DGGS
https://xdggs.readthedocs.io
Apache License 2.0
80 stars 14 forks source link

Consistent API for converting an xarray object to/from a DGGS xarray object #13

Closed benbovy closed 7 months ago

benbovy commented 1 year ago

One option could be reusing the .dggs accessor with the following methods (non-exhaustive):

# simple creation from existing cells
# this is just an alias of Dataset.set_xindex(coord_name, DGGSIndex, **kwargs)
ds.dggs.from_cell_ids(coord_name, grid_name=None, **kwargs)

# creation from a latitude/longitude rectilinear grid
# needs regridding and/or aggregation
ds.dggs.from_latlon_grid(lat_coord_name, lon_coord_name, grid_name, **kwargs)
keewis commented 1 year ago

not sure about regridding, but I'm not sure if we should really use from_* for this: to me, those sound very much like factory methods. What do think about using a verb, for example ds.dggs.create_index(dggs_coord, **kwargs)?

benbovy commented 1 year ago

I'm not sure if we should really use from_* for this: to me, those sound very much like factory methods.

I actually view those methods very much like "factory" methods! Not in the sense of class factory methods but rather in the sense of creating a Dataset or DataArray that can be handled as a DGGS one from any other Dataset / DataArray that cannot. Maybe conversion methods is a better term, for which the from_ / to_ prefixes work pretty well I think.

We could have a ds.dggs.to_latlon_grid for conversion from DGGS to a rectilinear grid.

We could also add later ds.dggs.from_latlon_ugrid to convert from an unstructured mesh.

create_index sounds a bit too restrictive IMO. It is a good name for creation of a DGGS directly from an existing cell ids coordinate but not for the order cases that involve more operations than creating an index.

keewis commented 1 year ago

What I meant with from_ being factory methods is that they sound like they would construct a new object of a different python type, while in this case they return a modified copy.

I do think we need to be able to properly regrid, but in that case the main task is to actually regrid, not setting the index. What I mean with that is that I'd rather have regrid in the name of the API (which might be a accessor method, or a xesmf-like regridder object), where the final step after the actual regridding is to create the index.

As for ds.dggs.create_index specifically, I do think this kind of restriction makes for a cleaner API (though as always when designing a API I can't be certain until we actually use it).

benbovy commented 1 year ago

What I meant with from_ being factory methods is that they sound like they would construct a new object of a different python type, while in this case they return a modified copy.

Understood! But _from_ and _to_ naming patterns are quite usual for conversion methods, no? Let me list a few options:

# 1. accessor conversion methods (the accessor name serves itself as prefix)
ds.dggs.from_(...)
ds.dggs.to_(...)

# 2. accessor conversion methods
ds.dggs.dggs_from_(...)
ds.dggs.dggs_to_(...)

# 3. top-level conversion functions (the package name serves as prefix)
xdggs.from_(ds, ...)
xdggs.to_(ds, ...)

# 4. top-level conversion functions
xdggs.dggs_from_(ds, ...)
xdggs.dggs_to_(ds, ...)

2 and 4 look too verbose/repetitive to me. Either 1 or 3 are fine by me. Both xdggs and ds.dggs don't return a class so it doesn't really bother me using from_ prefix here for something that is not a factory method.

I get your point regarding the name vs. the main task... But here I think that from the user point of view the task is best described as converting a Dataset or DataArray from one form (raster, mesh, etc.) from/to a DGGS form. Regridding, resampling, index creation, etc. are important but implementation details.

ds.dggs.from_cell_ids is a special case. It is just an alias but I think it is worth it for discoverability. Having a ds.dggs.create_index wouldn't add much value to ds.set_xindex(..., DGGSIndex).