Closed jlperla closed 9 years ago
Not off base at all - very important. I have started a simplified prototype in src/ that follows the Kokkos::View implementation. The implementation mixes implicit and explicit dimensions the same way we do it in Kokkos. In Kokkos performance testing we have seen significant performance improvements with explicit dimensions. It would be good to have the same data to show for the prototype. It also helps that explicit dimensions do not consume registers, especially for GPU kernels.
Took a peak at the code. great technique, I will close this as reference will be description enough.
I just wanted to do a sanity check that the implementation of
operator()
is able to handle fixed extents for optimization.Stating the obvious simplest example:
Of course, when finding the offsets/strides in dynamic_test, there isn't much that can be done. But one of the strides is constexpr and 2 in the static example, gives plenty of opportunity for optimization.
I think it would be useful in the document to give a simple example saying how
operator()
could implement this generically. It may just come down to clever organization with constexpr.If there is no tricky way to do it generically, is it going to require a combinatorial number of overloads of
operator()
to use the constexpr version where available and a dynamic lookup of strides? (i.e. a concept that checks a whether each index is static or not). The number of overloads required may also interact with storage ordering. I think thatview<int[][2], view_layout_right>
andview<int[][2], view_layout_left>
have different optimization possibilities for the 2 constexpr stride.Sorry if I missed the steps or description in the current setup. Feel free to completely ignore this if I am off base or missing something simple.