nextsimhub / nextsimdg

neXtSIM_DG : next generation sea-ice model with DG
https://nextsim-dg.readthedocs.io/en/latest/?badge=latest
Apache License 2.0
10 stars 13 forks source link

Array ordering AGAIN #275

Closed timspainNERSC closed 1 year ago

timspainNERSC commented 1 year ago

While finally trying to reconcile the dynamics and the thermodynamics, I am now convinced that we need to discuss array ordering.

The multi-indexing in ModelArray follows the C/C++ convention that right-hand indices vary fastest. This arises from the form of a naive n-dimensional array, which would be initialised as

int matrix[2][3] = {{1, 4, 2}, {3, 6, 8}};

The () multi-dimensional indexing operator of the ModelArray class follows this convention, such that array(23, 13) and array(23, 14) are consecutive in memory, and would be accessed sequentially when iterating over the raw indices. This results in being able to write a for loop as

for (size_t i = 0; i < nx; ++i)
    for (size_t j = 0; j < ny; ++j)
        array(i, j) = something;

which access memory locations sequentially and orders indices in reading order from left to right.

In the dynamics, on the other hand, the data is considered to vary with the x coordinate varying fastest. This would result the data points at (x, y) = (13, 23) and (14, 23) being adjacent in memory.

There are three immediately obvious solutions.

1) Swap the coordinates in the dynamics

By swapping x and y in the dynamics, the two arrays are consistent with each other and the C convention for indexing multi-dimensional arrays and nested for loops are in reading-order. This is probably a non-starter as so much of the dynamics code is programmed under the x-varies fastest paradigm.

2) Swap the coordinates in ModelArray

Again, this will bring the dynamics and ModelArray into consistency. ModelArray would then more resemble a Fortran multi-dimensional array, where dimensions in nested for(DO) loops are in reverse-reading order

DO i = 1, SIZE(a, 2)
  DO j = 1, SIZE(a, 1)
    a(j, i) = something
  END DO
END DO

for (size_t i = 0; i < nx; ++i)
    for (size_t j = 0; j < ny; ++j)
        array(j, i) = something;

3) Deal with it

Accept that the two conventions are different and work with them as they are. This is a potential source of bugs for any code that interacts with both the dynamics and with ModelArray.

timspainNERSC commented 1 year ago

4) I have got everything wrong and the two are in fact perfectly compatible already.

timspainNERSC commented 1 year ago

References for the ordering of arrays in C and Fortran (which I can never keep straight in my head) W3Schools for C Fortran-lang.org

draenog commented 1 year ago

In the current situation does it need that the memory is traversed in not consecutive order in some places or does it mean that land mask is interpreted differently in dynamics and in part of the code that use ModelArray?

In general I think it would be better to have a consistent convention between ModelArray and dynamics. Otherwise we would need to be very careful for example how to interpret land masks. We should also try to encapsulate machinery for traversing arrays in case we need to change the order in the future to something more complicated like tiling.

einola commented 1 year ago

In general I think it would be better to have a consistent convention between ModelArray and dynamics. Otherwise we would need to be very careful for example how to interpret land masks. We should also try to encapsulate machinery for traversing arrays in case we need to change the order in the future to something more complicated like tiling.

These points are key for me: Consistency across the board, and encapsulation, so as not to confuse (naive) users. Both should decrease substantially the risk of PEBCAK-type of programming errors in the future (I'm not least concerned about future-Einar here).

An additional point is any communication with the coupler and other models written in Fortran. Do we want to avoid transpose operations (and similar), or does that not matter for speed? This confused me very much when dealing with the coupling in Lagrangian neXtSIM!

a-smith-github commented 1 year ago

It’s been a while since I’ve considered this but as I recall it (and without inspecting the code) it would be better, particularly for larger arrays, to move everything to a single convention (by default the C/row-major convention unless there is a specific need to go column-major). Not only for debugging/maintenance reasons - it would be make things consistent plus there is perhaps an expectation we do things the C way - but also because of how the memory is stored and accessed. For ordinary multi-dimensional arrays (as noted), we would get better performance using the C convention because that would mean accessing the data in the order it is physically stored in memory. For non-contiguous retrieval/write, caching becomes a major problem.

Kacper may have more recent knowledge of the issue. There might be a way of specifically requesting the data is stored transposed so we can overcome this issue --- does 'matrix' do this? Using everything as std::vector (or similar) might otherwise limit the impact but only if we create it in the way we intend to read and write to/from it --- we can't mix and match conventions, we'd require the dynamics to follow the same convention as core.

IMO one convention only is usually the right choice and for a C/C++ I’d always recommend row major unless there is a specific use case. If the underlying data structures are capable of it, you could go row major, but lets not mix conventions.

timspainNERSC commented 1 year ago

It’s been a while since I’ve considered this but as I recall it…

This is all true. The key question is how that sequential access looks in the code. Raw C-style multidimensional arrays are not being used anywhere, so how the ModelArray class converts from the arguments to operator() to 1-d array is entirely up to us. Currently, it matches C order and, as described in the issue, a C-style nested for loop will access the data in the optimal manner. We could, however, decide to match dimensions with the dynamics and change how ModelArray interprets array(i, j, k) to match the dynamics order. The few nested for loops would have to be re-written, but overElements() will continue to iterate over the elements of arrays in sequential memory order.

timspainNERSC commented 1 year ago

In re netCDF:

The netCDF C User's guide describes the storage of data in a netCDF variable as being row-major (last index fastest). From Appendix B: File format specifications:

vardata = [values ...] // All data for a non-record variable, as a // block of values of the same type as the // variable, in row-major order (last // dimension varying fastest).

Similarly, the Fortran 90 User's guide specifically makes mention that the order of dimensions in the data description is reversed in the corresponding Fortran array:

In Fortran-90, the dimensions are reversed from the CDL declaration with the first dimension varying fastest

With first fastest indexing, ModelArray (and dgVector) will need to behave like Fortran, reversing the order of dimensions with respect to the data definition in the netCDF file.

einola commented 1 year ago

So, netCDF uses row-major, like C, right? This is unsurprising since the netCDF library is written in C. But we rarely talk directly to the netCDF library - most of the time, we'll be doing I/O through XIOS. It's only restart files that we write and read ourselves.

einola commented 1 year ago

From the Lagrangian neXtSIM, I found that I created a grid for OASIS like this:

    double lat; 
    double lon; 
    int i=0; 
    double Y = ymin;
    // x changes in cols, y in rows
    for (int nrows=0; nrows<M_nrows; nrows++)
    {    
        double X = xmin;
        for (int ncols=0; ncols<M_ncols; ncols++)
        {    
            int status = inverse_mapx(map,X,Y,&lat,&lon);
            M_grid.gridLAT[i] = lat; 
            M_grid.gridLON[i] = lon; 
            X += mooring_spacing;
            i++; 
        }    
        Y += mooring_spacing;
    }    
    M_ymax = Y - mooring_spacing;

Where inverse_mapx projects from the x/y coordinates of a polar-stereographic grid to lat/lon. This should be column-major, right??? And I had to do a transpose in the Fortran/C interface for OASIS to get it to work (I probably should have constructed it row-major, to begin with, but I was confusing myself at this point). I also see that I transposed the data before sending it to the netCDF output stream (via putVar).

To conclude: Talking to OASIS or other Fortran-based things, we should use column-major indexing, like the dynamics. Talking to netCDF, we should use row-major, but that doesn't matter because we primarily write restart files, and it doesn't matter if they are transposed. Ergo: option 2 is the appropriate one. Do we agree?

timspainNERSC commented 1 year ago

So, netCDF uses row-major, like C, right? This is unsurprising since the netCDF library is written in C. But we rarely talk directly to the netCDF library - most of the time, we'll be doing I/O through XIOS. It's only restart files that we write and read ourselves.

Yes, but if we want to get data in and out for the demo to the SASIP meeting, then we aren't going to use XIOS (unless things progress very quickly). So we (I) have to get the netCDF interaction working both for the initial restart files and for the forcing files.

einola commented 1 year ago

Yes, but in this case, we don't care about the performance hit of a transpose operation. You can even transpose the data in the python pre-processing script.

timspainNERSC commented 1 year ago

To conclude: Talking to OASIS or other Fortran-based things, we should use column-major indexing, like the dynamics. Talking to netCDF, we should use row-major, but that doesn't matter because we primarily write restart files, and it doesn't matter if they are transposed. Ergo: option 2 is the appropriate one. Do we agree?

Talking to netCDF is just a matter of reversing the order of the dimensions describing each array. The interface between netCDF and Fortran arrays has been tested many times, after all. It should be as simple as reversing some vectors of dimension on read and write. I'm currently trying to solve the many bugs of this 'simple' operation.

Option 2 is my vote.

einola commented 1 year ago

Option 2 it is!

But something is confusing about these 'simple' things - the whole column-major vs row-major just doesn't fit into my brain. It takes much more effort to think about than it should!

timspainNERSC commented 1 year ago

A note on dgVector

The Eigen::matrix underlying dgVector is ordered as row-major (last fastest). This ensures that the DG components of a given point are grouped together in memory. However, as discussed above, the geographical data is ordered with the x-component varying fastest.

This means that the ordering of dimensions for a DG array in the restart or diagnostic netCDF files should be (y, x, DGcomp). The DGcomponent varies fastest, then the geographic x position, then the geographic y position. This means it is not as simple as reversing the indices from the previous code, as the DG component dimension must remain the fastest varying.

timspainNERSC commented 1 year ago

Ice temperature z levels.

Before I was treating z-level as an additional spatial dimension. With last-fastest (row-major) indexing, the z-levels were adjacent in memory. With first-fastest (column-major) indexing, the z-dimension will be the slowest varying spatial dimension, which seems sub-optimal for the ice thermodynamics calculations. The ice levels could be treated as components in the same way as the DG components, which are the fastest varying dimension in the Eigen data structure when they are present. The disadvantage of this is if we need to store the DG components when advecting the ice temperature as then both ice levels and the DG components are trying to exist as the component dimension.

For the initial work I will keep the z-dimension as a spatial dimension.

timspainNERSC commented 1 year ago

Merged into develop.