There is the growing evidence that smmregrid is not flexible enough to support to ongoing stress coming from external application. This issue will serve as a landing point for the develpoment that we are planning to bring in order to make it more flexible and sustainable.
Most important limitation that I am aware of is the impossibility of dealing with xarray dataset which does not share dimensions in exact way. You cannot remap a source which have both oceanic and atmospheric data.
In order to overcome this limitation, the idea is make smmregrid work based on multiple gridtype instead of assuming that the data is all on the same grids. This is somehow not different from what CDO is doing.
Introduction of a function to identify the gridtype that are available in a specific Dataset/DataArray based on shared dimensions, and exclude those which are not relevant (the ones that include bnds for example. Please note that this might be required by CDO so we should learn how to bring them along). Any gridtype can be identifed based on a tuple of dimensions (no strict naming required). Then, we can build a dictionary and exploit all the tools we have already available to detect vertical and horizontal dimensions, as well as variables which are lying on the specific gridtype. This is planned to be a class and I did already some successful test offline.
Make all the smmregrid class object based on the gridtype. So the cdo_generate_weights will return not a simple xarray object, but a dictionary associated with each gridtype. All the object of the class are now expected to work with dictionaries, so that we can point to the required grid everytime.
As a final step, the regrid call should check what is the gridtype of the data that is fed into the method, and match it with the ones available in weights.
Overall, this should massively improve the flexibility of the tool as well as the handling of the vertical coordinates, which will come now as inner property of the gridtype.
I plan also to move the apply_weights into the Regrid class to minimize the redundancy of the variables.
Tricky issues that I see:
Deal with precomputed weights stored on disk in the correct way, so that we associate each of them to the right gridtype. This is probably something more from AQUA but we should be think about it.
Do not mess with DataArray and Dataset: currently the core regrid functions works on DataArray but having multiple grids must require the proper handling of the Dataset
Find the right position where to load the data: we need to access metadata quite early so even if file are supplied this has to be moved at the init of the class.
More that I do not see now
I did a test in my local branch and it seems feasible to achieve. I already have a nice GridInspector class that provides most of the grid information. Still, I very far from having something decent so I am not opening a PR yet.
There is the growing evidence that smmregrid is not flexible enough to support to ongoing stress coming from external application. This issue will serve as a landing point for the develpoment that we are planning to bring in order to make it more flexible and sustainable.
Most important limitation that I am aware of is the impossibility of dealing with xarray dataset which does not share dimensions in exact way. You cannot remap a source which have both oceanic and atmospheric data.
In order to overcome this limitation, the idea is make smmregrid work based on multiple
gridtype
instead of assuming that the data is all on the same grids. This is somehow not different from what CDO is doing.gridtype
that are available in a specific Dataset/DataArray based on shared dimensions, and exclude those which are not relevant (the ones that includebnds
for example. Please note that this might be required by CDO so we should learn how to bring them along). Anygridtype
can be identifed based on a tuple of dimensions (no strict naming required). Then, we can build a dictionary and exploit all the tools we have already available to detect vertical and horizontal dimensions, as well as variables which are lying on the specificgridtype
. This is planned to be a class and I did already some successful test offline.gridtype
. So thecdo_generate_weights
will return not a simple xarray object, but a dictionary associated with eachgridtype
. All the object of the class are now expected to work with dictionaries, so that we can point to the required grid everytime.regrid
call should check what is thegridtype
of the data that is fed into the method, and match it with the ones available in weights.Overall, this should massively improve the flexibility of the tool as well as the handling of the vertical coordinates, which will come now as inner property of the
gridtype
. I plan also to move theapply_weights
into theRegrid
class to minimize the redundancy of the variables.Tricky issues that I see:
gridtype
. This is probably something more from AQUA but we should be think about it.I did a test in my local branch and it seems feasible to achieve. I already have a nice
GridInspector
class that provides most of the grid information. Still, I very far from having something decent so I am not opening a PR yet.