Closed semi-h closed 8 months ago
Refactored all the transpose functions into one as we agreed, but kept the cuda kernels as they are due to performance concerns.
Also, renamed transpose and trans into reorder. The name tranpose stuck from the original Xcompact3D implementation, but the operation we're carrying out is not any more a transpose I believe. It is always local and just reorders data into x, y, or z directions so that we can use our 1D tridiagonal solver. What do you think about this? I added all the renaming stuff in a single commit and easily convert back if people disagree.
Refactored all the transpose functions into one as we agreed, but kept the cuda kernels as they are due to performance concerns.
Also, renamed transpose and trans into reorder. The name tranpose stuck from the original Xcompact3D implementation, but the operation we're carrying out is not any more a transpose I believe. It is always local and just reorders data into x, y, or z directions so that we can use our 1D tridiagonal solver. What do you think about this? I added all the renaming stuff in a single commit and easily convert back if people disagree.
reorder seems sensible to me, alternatives (only if you're unhappy with it) could be "orient" or "rotate", but I see no need to change from reorder.
Added a simple test, please let me know if there is anything you want to add. I plan to merge tomorrow.
Looking good to me. I think in the future we could move checkperf
into a separate test module because it is used in other tests too. But this is outside the scope of this PR.
Can someone also comment here the reasoning behind refactoring the individual reorder functions (e.g. trans_x2y
) into the switch-style reorder(u_y, u, RDR_X2Y)
?
To me, the calling code remains semantically identical so there's no reduction in complexity there, and there has to be a separate kernel for every direction anyway, so we haven't reduced complexity on the backend. I feel like I'm missing the advantage of introducing this new switch statement.
Can someone also comment here the reasoning behind refactoring the individual reorder functions (e.g.
trans_x2y
) into the switch-stylereorder(u_y, u, RDR_X2Y)
?
To me reorder(... RDR_X2Y)
feels cleaner because you have a single subroutine to do the work, you don't have to setup 6 different ones, all having the same set of input and output definitions that is error prone. When reading the code it felt simpler that way to be more concise.
Can someone also comment here the reasoning behind refactoring the individual reorder functions (e.g.
trans_x2y
) into the switch-stylereorder(u_y, u, RDR_X2Y)
?
It helped us to remove some of the backend procedures and we ended up having fewer lines. Also, we need separate kernels in the CUDA backend but in the OpenMP backend we can actually have a single subrotuine doing all the reorderings based on the switch parameter.
We need to reorder field data as we switch between x, y, and z orientations.
First commit adds transposing operation from x to y.
Select type stuff we have to do here is a bit annoying, and it is mainly due to the extention of the base field type in CUDA backend to support device arrays in Fortran. Any suggestion here is welcome @JamieJQuinn.