kokkos / array_ref

Polymorphic multidimensional array view
36 stars 9 forks source link

`operator()` implementation with (some) static extents #5

Closed jlperla closed 9 years ago

jlperla commented 9 years ago

I just wanted to do a sanity check that the implementation of operator() is able to handle fixed extents for optimization.

Stating the obvious simplest example:

view<int[][]> dynamic_test = ...; //Set both extents to 2, and fill in with 2x2
view<int[2][2]> static_test = ...; //Fill in the 2x2 data.
dynamic_test(1,1); //Compiler has to lookup both strides in memory, etc.
static_test(1,1); //Compiler could generate code using both constexpr strides

Of course, when finding the offsets/strides in dynamic_test, there isn't much that can be done. But one of the strides is constexpr and 2 in the static example, gives plenty of opportunity for optimization.

I think it would be useful in the document to give a simple example saying how operator() could implement this generically. It may just come down to clever organization with constexpr.

If there is no tricky way to do it generically, is it going to require a combinatorial number of overloads of operator() to use the constexpr version where available and a dynamic lookup of strides? (i.e. a concept that checks a whether each index is static or not). The number of overloads required may also interact with storage ordering. I think that view<int[][2], view_layout_right> and view<int[][2], view_layout_left> have different optimization possibilities for the 2 constexpr stride.

Sorry if I missed the steps or description in the current setup. Feel free to completely ignore this if I am off base or missing something simple.

hcedwar commented 9 years ago

Not off base at all - very important. I have started a simplified prototype in src/ that follows the Kokkos::View implementation. The implementation mixes implicit and explicit dimensions the same way we do it in Kokkos. In Kokkos performance testing we have seen significant performance improvements with explicit dimensions. It would be good to have the same data to show for the prototype. It also helps that explicit dimensions do not consume registers, especially for GPU kernels.

jlperla commented 9 years ago

Took a peak at the code. great technique, I will close this as reference will be description enough.