[i-first] Extend kernel meta-data to capture data model

arporter commented 4 years ago

The UM 'physics' kernels that are being called from LFRic expect a different data layout/model to that of the standard LFRic kernels. The latter expect to operate on arrays stored as (k, l) where l is a column index obtained from a map lookup (because we have an unstructured mesh in the horizontal). In contrast, the physics kernels expect (i, k) where i runs over all points in the horizontal. In order to generate suitable looping structures when calling such kernels, we need to capture information on the data model in the kernel meta-data.

arporter commented 4 years ago

I propose that we add DATA_LAYOUT = xxx to kernel meta-data. The choice of how to specify xxx is a bit more involved. In GOcean our kernels are i,j, in NEMO they are i,j,k and in the LFRic dynamics kernels they are k,l where l means 'indirect'. In the Physics kernels they are i,k. We could of course then just use that notation but I think we want something a bit more obvious/explicit.

arporter commented 4 years ago

x,y, x,y,z, xy, z and z, xy-lookup or z, column would probably be a bit clearer. Since kernel meta-data must be valid Fortran and we don't want to end up polluting the namespace with e.g. a parameter named x, we probably want separate names for each of these cases. We would then have:

DATA_LAYOUT = LAYOUT_X_Y

or

DATA_LAYOUT = LAYOUT_Z_COLUMN

etc.

arporter commented 4 years ago

So far I have:

data_layout_metadata

Where the first row is supposed to describe what we currently have. I now need help/suggestions filling in this table. Probably from @thomasmelvin but maybe MiHo too. I'll email him...

rupertford commented 4 years ago

After discussion with @arporter I would like to suggest the following generic metadata (i.e. metadata for all APIs) ...

X, Y or Z for direct addressed as already proposed. I for indexed/indirect which would be a prefix. LAYOUT_Z_COLUMN would then become LAYOUT_Z_IXY

Any specialisation of the above could be API-specific and done in the future. For example we might want to add the concept of tracers (T), radiation bands (R) etc. in some APIs, or we might even want to rename X as LON, Y as LAT and Z as LEV?

rupertford commented 4 years ago

Another option (which I think I prefer) ...

Separate data layout and direct/indirect addressing e.g.

DATA_LAYOUT=LAYOUT_Z_XY ADDRESSING=DIRECT_Z, INDIRECT_XY

with a potential DEFAULT_ADDRESSING=DIRECT/INDIRECT in the config file allowing us to only specify one type.

arporter commented 4 years ago

I like that suggestion. I'll update the documentation appropriately and see how it goes.

arporter commented 4 years ago

I realised I wasn't being completely clear about the rank of the underlying array either. I've updated and now have:

data_layout_metadata

thomasmelvin commented 4 years ago

I like these suggestions.

Would this method be further extensible? i.e. we could represent data on the cubed sphere (and this is how others do it) as:

data_layout = LAYOUT_X_Y_Z_PANEL data_addressing = (/ DIRECT_X, DIRECT_Y, DIRECT_Z, DIRECT_PANEL /)

also how does this work for 2D fields? would we just have (for lfric)?

data_layout = LAYOUT_XY data_addressing = (/ INDIRECT_XY /)

arporter commented 4 years ago

@thomasmelvin yes and yes!

stevemullerworth commented 4 years ago

Looks OK to me. I had to think a bit to understand what interface of a kernel of each type would have, but I got there in the end.

I'm slightly concerned we haven't yet resolved the issue of kernels that use physics vector fields (such as horizontal and vertical winds).

In the LAYOUT_ZXY, each Z can relate to several columns of dofs within a cell.

In LAYOUT_XY_Z we currently envisage each choice for XY as being only a single cell-centred data point (W3) so it can be directly addressed. But if a kernel has a cell-centred field and a wind field on W2H, the latter has more data points per cell and may need to be indirectly addressed.

If a kernel has a cell-centred field and a wind-field on W2V, then there is sharing of the wind data between two different choices of Z.

I think the above proposal may be OK. But if we need an i-first solution to these sorts of kernels, it could have to be different looping strategies (looping over the data points in the output field which could be cell centred, on W2V faces or on W2H faces) and accessing the input fields (on the different space) with maps.

arporter commented 4 years ago

Thanks @stevemullerworth - I had forgotten that there might be multiple 'columns' of dofs per cell. Before I go ahead and start implementing, can I ask that we get some agreement (maybe from Tom or Ben?) on which sorts of kernels we need to support and whether what we've suggested so far will work?

rupertford commented 4 years ago

Hi @stevemullerworth. At the moment the layout decisions for LFRic are implicit and PSyclone internally implements a particular layout, including support for vector fields etc. With the addition of i-first we now need to distinguish between the different data layouts. So we need to capture 2 or 3 options 1) traditional layout, 2) i-first layout (with latlon collapsed) 3) i-first layout (with latlon separate). I think Andy wants to know whether it is all three we need to support or just two of the three. The metadata description we have come up with is more general than the above so allows different options which is a good thing but I don't think we need to capture vector field information - that could still be implicit (as long as it is consistent) and I presume the cell-centred (CG) options could be captured by the field types and perhaps iteration_space metadata as we already do? Of course the indirect access would be explicit. Am I missing something?

arporter commented 4 years ago

I sent the following as an email and then realised it would be better if it was here:

I've started updating the documentation with details of the new meta-data and that has made me think about specifying the extents of arrays being passed to kernels.

Currently, all our kernels accept FE fields and we have rules for passing in associated quantities that (implicitly) provide the array extents - e.g. nlayers, ncells, ndofs. It's done like this (i.e. implicitly) because I think we PSyclons have been fairly ignorant of the underlying meaning of these things. For instance, in some cases we pass in multiple 'ncell_3d' values despite the fact that that this actually means 'the total number of cells in 3d' and therefore only has one value.

With the move to supporting "i-first" kernels I just want to check that this implicit dimensioning information is still going to be sufficient or whether we need to adjust our rules. Presumably nlayers is still going to be the number of vertical levels and ncells is still going to be the number of xy locations and therefore we don't need to change/update the rules?

More generally, if we think about changing the way our rules are written (something that we've discussed a lot lately) along the lines of "a FE field has LAYOUT_ZZY, DIRECT_Z, INDIRECT_XY" with DIM(Z) == nlayers and DIM(XY) == ncells then we still provide the same information but the reasoning is clearer.

TeranIvy commented 3 years ago

@arporter, does this issue need to stay open? I think the kernel meta-data support was extended in PR #944.

arporter commented 3 years ago

I think this needs to stay open. What was done in #944 was just a stepping stone - ultimately (ideally) we will do the looping in the PSy layer and remove the operates_on = domain option.

stfc / PSyclone

[i-first] Extend kernel meta-data to capture data model #626