P1684: mdarray iteration over elements needs a parallel execution policy

In P1684R2, mdarray needs to iterate over elements in its conversion constructors (from mdarray or mdspan). (I'm not counting whatever iteration over elements the mdarray's container already does, e.g,. on construction with a nonzero size, or destruction.)

This iteration currently uses no execution policy. This is not good if the mdarray stores elements in a memory space that needs a matching execution policy for correctness or performance (e.g., GPU memory, or NUMA allocations with a particular distribution). This is analogous to why Kokkos::View has both a memory space (for allocations) and an execution space (for iteration).

There are potentially two different execution policies: the now-being-constructed mdarray's preferred policy, and the input's preferred policy. (An input mdspan doesn't define a way to get its preferred execution policy, though its accessor could define that implicitly.) Standard C++ doesn't have an idea of execution policy compatibility (i.e., inaccessible memory spaces), so we can just pick the policy at hand, from the input. (I'm presuming that the instance of the policy matters, which is a bit of a generalization from the current execution policies in the Standard.)

One way to fix this would be to have a customization point function that takes an mdarray or mdspan, and returns its preferred execution policy instance. One complication is that ranges::to doesn't currently take an execution policy (none of the ranges algorithms do). This would hinder constructing the new mdarray's container from the input (using ranges things like iota and cartesian_product to iterate over the input).

kokkos / mdarray

P1684: mdarray iteration over elements needs a parallel execution policy #13