Find solution to use Components both on structured and unstructured grids

ThibHlln commented 3 years ago

[Related to #12]

At the moment, only structured grids are implemented as SpaceDomain, so effectively they are stored as 2D arrays. This means that a science Component knows the rank of the array it is going to receive, and it knows where its neighbours are (e.g. useful for stencils, or for routing).

Moreover, for models like the current implementation of JULES, 2D grids can be vectorised easily since it is working on vertical columns with no lateral flow (hence no need to know where neighbours are). Therefore, JULES is readily usable on unstructured grids (e.g. the cubed-sphere of LFRic), and it would be a shame to convert 2D arrays to vectors (1D array) as part of the Component, and it would be better if the framework could provide the information as vectors directly.

So this issue is to suggest that we may want to consider/store all SpaceDomain as vectors, so that any geometry can be supported without for the Component to have to adjust to it: a Component would always receive a vector alongside information about where its neighbours are.

ThibHlln commented 3 years ago

While structured grids are most likely going to be stored as 2D arrays (e.g. Y, X in CF-conventions), unstructured grids can be stored as vectors, as an array with FillValue, or as ragged arrays (e.g. UGRID-conventions). Then, it may be a good idea to use a UGRID vector approach for both structured and unstructured grids internally in the framework for future proofing (i.e. future support for unstructured grids).

For the variables themselves, a structured grid could be converted easily from a 2D array to a vector with e.g. numpy.flatten, and brought back from a vector to a 2D array with e.g. numpy.reshape. Then, working on the vector, if neighbouring relationships are required. This could work as follows:

for flow direction: at the moment a relative {-1, 0; +1} value pair is stored for Y and X directions in flow_direction, this could be converted to a single value (either in relative or absolute form) - this would work exactly the same way for an unstructured grid
for list of neighbours: at the moment, this information is not stored, since it is trivial to find neighbours in a structured grid (i.e. +1/-1 on Y and X), but it is not trivial to find neighbours in an unstructured grid, and the use of the face_face_connectivity of UGRID is required. So, for structured grids, the same approach should be used - it will be easy for the framework to construct such variable and it means that any component will always be expected to work from a 'face_face_connectivity` variable, rather than +1/-1 on Y/X, which would prevent their potential use on unstructured grids.

I think that if a Z dimension is required for the inputs of a component, it should be kept separate from Y/X: only Y/X should be vectorised.

ThibHlln commented 3 years ago

Another aspect we could consider as part of this issue is the fact that the SpaceDomain.land_sea_mask is not used to subset the domain to land locations only, it is sending the whole domain, and the component can access the land_sea_mask if they want to do the subset themselves. Afterall, cm4twc is about the Terrestrial water cycle, so this should probably become the default approach to exclude the sea points before giving the variables to the components.

unifhy-org / unifhy

Find solution to use Components both on structured and unstructured grids #29