Have SZ, nx_loc, n_groups etc. accessible "globally"

Nanoseb commented 5 months ago

At the moment all the variables that define block sizes are defined in various places and sometimes duplicated over multiple structures. For example, backend_t has nx_loc, ny_loc, ny_loc, but so has backend_t%xdirps%n etc. SZ is accessed in m_omp_common or m_cuda_common depending on the backend, meaning we can't access it outside the backend functions (unless being passed as argument from there etc.).

It would be good to have a single consistent way of accessing all of these variable. It will also reduce the number of arguments needed to be passed around.

semi-h commented 5 months ago

I agree, there is a bit duplication in some variables and its better to clean them up sooner than later. Lets try to come up some sensible notation here and then implementing it will be easy.

I think the most important point is having a clear distinction between padded dimensions and actual domain dimensions. This will help for a smooth transition from periodic to non-periodic BCs. Currently in both backends we use nx/y/z_loc as if they're padded dimensions, especially in reorders, but their name doesn't suggest this at all. Maybe we can drop _loc because we almost always work with the local dimensions, and rarely with global. We can highlight in the name when a size is global and just not bother for local dimensions I think. So maybe we can replace nx/y/z_loc with dims_padded(3)?

For the actual local domain size, I think dirps%n is the best place to store it. tds_solve is passed the dirps and the domain size is already taken from the dirps%n I suppose.

Also, dirps%n_blocks is probably the best for accessing the number of pencil groups in the domain. Its name is the worst though, back then there was only CUDA stuff around and because the CUDA block dimension is the number of pencil groups, this influenced me to give it a wrong name. So maybe we can rename n_blocks to n_groups, and get rid of the globs%n_groups_x/y/z as it is completely a duplicate and has very few uses that can easily be replaced. n_groups may not be the best though, so please share if you have an idea. n_pencilettes could be a fun one :laughing:, as pencils were these chunky groups of lines in a 2D domain decomposition setting and the new SZ many lines grouped together are small compared to that.

SZ is a different story. Its annoying it comes from two different sources based on the backend, but having a single source can be a problem for performance when running a heterogeneous simulation. We don't support this yet, however if we give up now and collapse SZs into one then we will stop passing it as an argument, and then it'll be a pain to bring this seperation back into the codebase. I think this affects very few places in the code though, so would be good if can keep it as is.

Nanoseb commented 5 months ago

I agree with getting rid of _loc as we are almost always dealing with local domain indeed.

Happy with n_pencilettes :) but n_groups would also work for me

About SZ, I wasn't suggesting we have a single value. It should be based on the architecture/backend. But have a single way of accessing it would be good. At the moment we need to run either use m_omp_common, only: SZ or use m_cuda_common, only: SZ. And while most of the time it is ok because only under backend things, there are some moments when we will need to get SZ outside that and I'd rather not use ifdef cuda... unless we really have to. I had to do something like that for the ordering because that module shouldn't live under any backend for example.

Just a thought but, nx/ny/nz, n_groups, and SZ define dimensions of the fields. Could they be stored in field_t instead of dirps and be adapted depending on the dir. For example, if a field is in the y direction, field%n => ny, field%n_padded => ny_padded field%n_groups => n_groups_y, field%SZ => SZ etc. All of these variables would be defined in the allocator and setup automatically when issuing get_block(DIR_Y).

That way any loop working on a particular field will always work with the same loop ranges (field%n_groups, field%n and field%SZ) and these values will always be 'at hand' without passing dirps around when it isn't needed.

Nanoseb commented 1 month ago

solved by #95

xcompact3d / x3d2

Have SZ, nx_loc, n_groups etc. accessible "globally" #77