Open benbovy opened 1 year ago
Hello, @keewis is trying to put our efforts we made for our IAOCEA project related with healpix integration here. https://github.com/IAOCEA/xarray-healpy
We will try to update some example notebook with real data projection before Monday.
Note that we are not implementing rhealpix but healpix itself through healpy package.
Our final objective is that using property of Xarray-DGGS, we can
That looks great @tinaok and @keewis!
Your objectives look already quite specific and "high-level". I wonder if during the sprint it would be best to first discuss about
.sel
, etc. possibly with a dggs Xarray index) or if they would require custom API in an Xarray accessor.before getting our hands dirty into the code.
(3-4 are more specific to Python/Xarray but 1-2 may be interesting for anyone)
This might better structure the sprint and this would greatly help in having a better idea on whether an xarray-dggs extension (or any other package) makes sense for supporting common tasks across different global grids (healpix, s2, h3, etc.). At least for me as I don't have much experience in using DGGS for practical applications :)
I'm excited to participate in a sprint on this topic!
@benbovy I am happy to share our use case through the example we just added.
I can show how we convert data, and challenges we have today. With the same notebook, I can show a same model data with 2 different resolution. Which we hope to somehow 'connect' them using DGGS convention.
I'm also very much interested learning by DGGS specialist @allixender (?) how DGGS is used for routing.
If anyone from EERIE project or nextGEMS Cycle 3 ICON projects are around at BIDS23, (https://github.com/eerie-project/EERIE_hackathon_2023/ ?https://github.com/nextGEMS/nextGEMS_Cycle3 ? @koldunovn ?https://easy.gems.dkrz.de/Processing/healpix/healpix_starter.html ) I would love to hear their user stories with healpix, and also how they will make their data available (DestinE?).
Wow, nice ideas! I haven't heard that anyone I know from EERIE, DestinE or nextGEMS plan to participate in this code sprint. Also we will have our EERIE Hackathon this week.
From nextGEMS and EERIE the notebooks with examples of how we use unstructured data are available, and ICON data for the last nextGEMS cycle are all in HEALPix. Access to data is currently restrictive if you don't have DKRZ account, but if there is interest we can provide subset. In EERIE we are trying also to expose data through xpublish , but it's in early stage.
Those kind of projects would be great to see on nextGEMS Hackathon, that will be held 4-8 March somewhere around Hamburg. Let me know know if there is interest and I will get you in contact with nextGEMS people :)
Great to hear from you @koldunovn! The notebook examples will be helpful. I've created an account on DKRZ so I'm now able to ask for joining a project there if needed.
Great! If you interested in HEALPix I would start form this one, and explore the rest of the collection: https://easy.gems.dkrz.de/Processing/healpix/healpix_starter.html
Unstructured (FESOM2) and semi-structured data covered here: https://github.com/nextGEMS/nextGEMS_Cycle3
We are currently developing also EERIE notebooks, but there is a lot of examples using nextGEMS data as well: https://github.com/eerie-project/EERIE_hackathon_2023/
If you looking for something more concrete, let me know.
We started a shared document on HackMD for the sprint: https://hackmd.io/UBM5L6YNRlG73e3eVo6vOg
Xarray DGGS extension library in development here: https://github.com/benbovy/xdggs
Cross-posting here what I've suggested in the pangeo discource thread.
Xarray-DGGS
I think that a good and reasonable goal for the sprint would be to come up with an
xarray-dggs
package that would provide an xarray-compatible interface to various DGGS features exposed in 3rd-party Python libraries (e.g., healpy, pys2index, spherely, h3-py, dggrid4py, etc.) through a very basic set of features:.sel()
I think that DGGS grids have enough in common to expose the functionality for all of them in a common
xarray-dggs
package, maybe with optional dependencies for each backend (healpy, pys2index, h3-python, DGGRID, etc.).This proposal builds on top of a few suggestions found in the README of this repository, e.g., H3 or rHEALPIx + Xoak + Xarray, H3 or rHEALPIx + Xoak + Xarray + Xvec?. While both xoak and xvec can be good sources of inspiration for
xarray-dggs
, those packages have slightly different scopes: Xoak provides generic tree-based indexes (not only geospatial) and Xvec currently works only with shapely (planar geometries). Xoak has a nice API for nearest-neighbors point-wise indexing that leverages Xarray advanced indexing (i.e., usingxarray.DataArray
objects) but it still has to be refactored so it builds on top of Xarray custom indexes. Xvec is one of the few (the only?) released Xarray extensions that provide an Xarray custom index.The sub-topics and (open) questions listed below are not exhaustive. Please feel free to suggest in the comments below any important topic or question that is missing.
Data model
An Xarray index must relate to one or more coordinates with arbitrary dimensions. In the case of DGGS, what should be the coordinates and their dimension(s)?
Do we need to have a fixed data model for all DGGS? It can be flexible, i.e., an Xarray Index subclass may support different data models (build options, flexible inputs).
Should we restrict the index and/or coordinates to a fixed level / zoom / resolution of the discrete global grid?
I guess we need some sort of CRS and/or additional metadata for certain kinds of grids (custom parameters)? Some grid parameters could perhaps be hidden as internal attributes of the index?
Data selection API (
.sel
)There are a lot of possibilities regarding how to select data on a discrete global grid. What kind of indexer object(s) could we pass to xarray
.sel()
?How to detect the kind of indexer? We could look at the type of the indexers (scalar, slice, list, array, custom object), the value type, etc. Note: currently it is not possible to pass custom options to
.sel
https://github.com/pydata/xarray/issues/7099.Assessing the capabilities of the DGGS Python libraries
There are some important requirements for reusing those libraries efficiently with Xarray:
Perhaps not all libraries mentioned above have those requirements. Which ones should we focus our efforts? Which kinds of data selection listed above should we focus on considering a common set of core features available in all libraries?