This PR adds several versions of the code using C++ standard parallel algorithms.
In the c directory there are 4 new versions that are the product of Serial and MPI x std::for_each and std::for_each_n. Both for_each and for_each_n are idiomatic C++ and having both versions allows showing the difference both in how the code is written and in performance. In some cases we have observed small differences in performance between these two algorithms. These are based on their respective C baseline versions.
In the cpp directory you will find an additional 4 versions, the same combinations. These differ from the above in that they use C++23 mdspan in place of raw pointers (and in place of YAKL Arrays). Compared to the above versions, you should only see differences in the function prototypes (passing mdspans rather than raw pointers) and the access to those variables, which no longer requires calculating offsets. Currently the nvc++ compiler has mdspan in the experimental namespace, but this will likely change in the future.
One other change to note is the use of the idx2d and idx3d constexpr functions. These allow simple extraction of the 2D and 3D loop indices from the 1D execution space. When cartesian_product becomes ubiquitously available those functions will no longer be necessary.
This PR adds several versions of the code using C++ standard parallel algorithms.
In the
c
directory there are 4 new versions that are the product of Serial and MPI xstd::for_each
andstd::for_each_n
. Bothfor_each
andfor_each_n
are idiomatic C++ and having both versions allows showing the difference both in how the code is written and in performance. In some cases we have observed small differences in performance between these two algorithms. These are based on their respective C baseline versions.In the
cpp
directory you will find an additional 4 versions, the same combinations. These differ from the above in that they use C++23mdspan
in place of raw pointers (and in place of YAKL Arrays). Compared to the above versions, you should only see differences in the function prototypes (passing mdspans rather than raw pointers) and the access to those variables, which no longer requires calculating offsets. Currently the nvc++ compiler hasmdspan
in the experimental namespace, but this will likely change in the future.One other change to note is the use of the
idx2d
andidx3d
constexpr functions. These allow simple extraction of the 2D and 3D loop indices from the 1D execution space. Whencartesian_product
becomes ubiquitously available those functions will no longer be necessary.