Closed Nanoseb closed 1 month ago
I agree, there is a bit duplication in some variables and its better to clean them up sooner than later. Lets try to come up some sensible notation here and then implementing it will be easy.
I think the most important point is having a clear distinction between padded dimensions and actual domain dimensions. This will help for a smooth transition from periodic to non-periodic BCs. Currently in both backends we use nx/y/z_loc
as if they're padded dimensions, especially in reorders, but their name doesn't suggest this at all. Maybe we can drop _loc
because we almost always work with the local dimensions, and rarely with global. We can highlight in the name when a size is global and just not bother for local dimensions I think. So maybe we can replace nx/y/z_loc
with dims_padded(3)
?
For the actual local domain size, I think dirps%n
is the best place to store it. tds_solve
is passed the dirps
and the domain size is already taken from the dirps%n
I suppose.
Also, dirps%n_blocks
is probably the best for accessing the number of pencil groups in the domain. Its name is the worst though, back then there was only CUDA stuff around and because the CUDA block
dimension is the number of pencil groups, this influenced me to give it a wrong name. So maybe we can rename n_blocks
to n_groups
, and get rid of the globs%n_groups_x/y/z
as it is completely a duplicate and has very few uses that can easily be replaced. n_groups
may not be the best though, so please share if you have an idea. n_pencilettes
could be a fun one :laughing:, as pencils
were these chunky groups of lines in a 2D domain decomposition setting and the new SZ many lines grouped together are small compared to that.
SZ
is a different story. Its annoying it comes from two different sources based on the backend, but having a single source can be a problem for performance when running a heterogeneous simulation. We don't support this yet, however if we give up now and collapse SZ
s into one then we will stop passing it as an argument, and then it'll be a pain to bring this seperation back into the codebase. I think this affects very few places in the code though, so would be good if can keep it as is.
I agree with getting rid of _loc
as we are almost always dealing with local domain indeed.
Happy with n_pencilettes
:) but n_groups
would also work for me
About SZ
, I wasn't suggesting we have a single value. It should be based on the architecture/backend. But have a single way of accessing it would be good. At the moment we need to run either use m_omp_common, only: SZ
or use m_cuda_common, only: SZ
. And while most of the time it is ok because only under backend things, there are some moments when we will need to get SZ
outside that and I'd rather not use ifdef cuda...
unless we really have to. I had to do something like that for the ordering because that module shouldn't live under any backend for example.
Just a thought but, nx/ny/nz, n_groups, and SZ define dimensions of the fields. Could they be stored in field_t
instead of dirps
and be adapted depending on the dir
.
For example, if a field is in the y
direction, field%n => ny
, field%n_padded => ny_padded
field%n_groups => n_groups_y
, field%SZ => SZ
etc. All of these variables would be defined in the allocator and setup automatically when issuing get_block(DIR_Y)
.
That way any loop working on a particular field will always work with the same loop ranges (field%n_groups, field%n and field%SZ) and these values will always be 'at hand' without passing dirps
around when it isn't needed.
solved by #95
At the moment all the variables that define block sizes are defined in various places and sometimes duplicated over multiple structures. For example,
backend_t
hasnx_loc
,ny_loc
,ny_loc
, but so hasbackend_t%xdirps%n
etc.SZ
is accessed inm_omp_common
orm_cuda_common
depending on the backend, meaning we can't access it outside the backend functions (unless being passed as argument from there etc.).It would be good to have a single consistent way of accessing all of these variable. It will also reduce the number of arguments needed to be passed around.