Add transposer kernels for reordering field data.

xcompact3d / x3d2

https://xcompact3d.github.io/x3d2

BSD 3-Clause "New" or "Revised" License

3 stars 4 forks source link

Add transposer kernels for reordering field data. #29

Closed semi-h closed 8 months ago

semi-h commented 9 months ago

We need to reorder field data as we switch between x, y, and z orientations.

First commit adds transposing operation from x to y.

Select type stuff we have to do here is a bit annoying, and it is mainly due to the extention of the base field type in CUDA backend to support device arrays in Fortran. Any suggestion here is welcome @JamieJQuinn.

semi-h commented 9 months ago

Refactored all the transpose functions into one as we agreed, but kept the cuda kernels as they are due to performance concerns.

Also, renamed transpose and trans into reorder. The name tranpose stuck from the original Xcompact3D implementation, but the operation we're carrying out is not any more a transpose I believe. It is always local and just reorders data into x, y, or z directions so that we can use our 1D tridiagonal solver. What do you think about this? I added all the renaming stuff in a single commit and easily convert back if people disagree.

pbartholomew08 commented 9 months ago

Refactored all the transpose functions into one as we agreed, but kept the cuda kernels as they are due to performance concerns.

Also, renamed transpose and trans into reorder. The name tranpose stuck from the original Xcompact3D implementation, but the operation we're carrying out is not any more a transpose I believe. It is always local and just reorders data into x, y, or z directions so that we can use our 1D tridiagonal solver. What do you think about this? I added all the renaming stuff in a single commit and easily convert back if people disagree.

reorder seems sensible to me, alternatives (only if you're unhappy with it) could be "orient" or "rotate", but I see no need to change from reorder.

semi-h commented 9 months ago

Added a simple test, please let me know if there is anything you want to add. I plan to merge tomorrow.

Nanoseb commented 9 months ago

Looking good to me. I think in the future we could move checkperf into a separate test module because it is used in other tests too. But this is outside the scope of this PR.

JamieJQuinn commented 8 months ago

Can someone also comment here the reasoning behind refactoring the individual reorder functions (e.g. trans_x2y) into the switch-style reorder(u_y, u, RDR_X2Y)?

To me, the calling code remains semantically identical so there's no reduction in complexity there, and there has to be a separate kernel for every direction anyway, so we haven't reduced complexity on the backend. I feel like I'm missing the advantage of introducing this new switch statement.

Nanoseb commented 8 months ago

Can someone also comment here the reasoning behind refactoring the individual reorder functions (e.g. trans_x2y) into the switch-style reorder(u_y, u, RDR_X2Y)?

To me reorder(... RDR_X2Y) feels cleaner because you have a single subroutine to do the work, you don't have to setup 6 different ones, all having the same set of input and output definitions that is error prone. When reading the code it felt simpler that way to be more concise.

semi-h commented 8 months ago

Can someone also comment here the reasoning behind refactoring the individual reorder functions (e.g. trans_x2y) into the switch-style reorder(u_y, u, RDR_X2Y)?

It helped us to remove some of the backend procedures and we ended up having fewer lines. Also, we need separate kernels in the CUDA backend but in the OpenMP backend we can actually have a single subrotuine doing all the reorderings based on the switch parameter.