jhardenberg / smmregrid

A compact regridder using sparse matrix multiplication
Apache License 2.0
8 stars 0 forks source link

Refactoring of smmregrid to support for more complex dataset structures #32

Open oloapinivad opened 1 month ago

oloapinivad commented 1 month ago

There is the growing evidence that smmregrid is not flexible enough to support to ongoing stress coming from external application. This issue will serve as a landing point for the develpoment that we are planning to bring in order to make it more flexible and sustainable.

Most important limitation that I am aware of is the impossibility of dealing with xarray dataset which does not share dimensions in exact way. You cannot remap a source which have both oceanic and atmospheric data.

In order to overcome this limitation, the idea is make smmregrid work based on multiple gridtype instead of assuming that the data is all on the same grids. This is somehow not different from what CDO is doing.

  1. Introduction of a function to identify the gridtype that are available in a specific Dataset/DataArray based on shared dimensions, and exclude those which are not relevant (the ones that include bnds for example. Please note that this might be required by CDO so we should learn how to bring them along). Any gridtype can be identifed based on a tuple of dimensions (no strict naming required). Then, we can build a dictionary and exploit all the tools we have already available to detect vertical and horizontal dimensions, as well as variables which are lying on the specific gridtype. This is planned to be a class and I did already some successful test offline.
  2. Make all the smmregrid class object based on the gridtype. So the cdo_generate_weights will return not a simple xarray object, but a dictionary associated with each gridtype. All the object of the class are now expected to work with dictionaries, so that we can point to the required grid everytime.
  3. As a final step, the regrid call should check what is the gridtype of the data that is fed into the method, and match it with the ones available in weights.

Overall, this should massively improve the flexibility of the tool as well as the handling of the vertical coordinates, which will come now as inner property of the gridtype. I plan also to move the apply_weights into the Regrid class to minimize the redundancy of the variables.

Tricky issues that I see:

I did a test in my local branch and it seems feasible to achieve. I already have a nice GridInspector class that provides most of the grid information. Still, I very far from having something decent so I am not opening a PR yet.