kokkos / mdspan

Reference implementation of mdspan targeting C++23
Other
415 stars 69 forks source link

submdspan unable to yield static extents #166

Closed johan-overbye closed 2 years ago

johan-overbye commented 2 years ago

First of all, thank you so much for this amazing proposal. Really hope it makes it into the standard!

One thing that's putting a slight damper on its usefulness, is that submdspan yields a dynamic extent even if a static extent is possible. In fact it looks like it can't be implemented to yield static extents, as the extents are determined by function arguments. Also, mdarray support is sorely missed.

Here's a suggestion for an alternate subset function:

mdarray<float, extents<size_t, 4, 4>> matrix;
int i;
auto row = submdspan<collapse_dimension, 4>(matrix, i, 0);
// decltype(row) is mdspan<float, extents<size_t, 4>>

The idea is that the desired extents of the subset are passed as template parameters, while the offsets are passed as function arguments. collapse_dimension can be of a dummy type, and indicate that that extent should be "collapsed" or removed from the resulting subset, similarly to passing a number instead of a pair to the current submdspan.

mhoemmen commented 2 years ago

@johan-overbye wrote:

First of all, thank you so much for this amazing proposal. Really hope it makes it into the standard!

Thanks for the kind words! WG21 voted it into C++ on Monday, so it is officially in the Standard (at least the current Standard draft).

... is that submdspan yields a dynamic extent even if a static extent is possible.

PR https://github.com/kokkos/mdspan/pull/159 , which merged yesterday, fixes this. You can see examples in the tests here. To ensure a compile-time extent corresponding to a given slice specifier:

Note that submdspan was removed from the proposal at LWG review time, before the plenary vote. Thus, we still have the opportunity to improve submdspan. This new feature will be added to the separate submdspan proposal in progress.

Thank you for your feedback and for trying out mdspan! I'll close the issue as a duplicate of the issue resolved by PR #159, but please feel free to reopen if this new feature does not meet your needs.

johan-overbye commented 2 years ago

@mhoemmen Wow, congrats on the vote! That's great news. 🥳

Thanks for the heads-up on this new feature, however it doesn't quite seem to solve the use case I described unless I'm missing something.

In the use case above, the idea was that one or more extents of the subset are compile-time determined, but the corresponding offsets are run-time determined. This doesn't seem to be covered by the addition you describe, as it seems like the second element of the tuple/pair represents the past-the-end index rather than the extent of the subset, based on testing an earlier version of submdspan.

I've implemented and used two different libraries that resemble limited versions of mdspan/mdarray, and I've found that this use case crops up a lot, e.g. in linear algebra.

johan-overbye commented 2 years ago

Wait a second, I see now that this new submdspan does support the particular example I highlighted above (grabbing a matrix row at a run-time-determined row index), because the row dimension is "collapsed". Very cool!

However I'd still argue the that enabling a run-time offset and a compile-time extent would be very useful. @mhoemmen

mhoemmen commented 2 years ago

Just to clarify for other readers: your example using current submdspan would look like this.

mdspan<float, extents<size_t, 4, 4>> matrix{ /* ... */ };

// row is layout_right and has extents<size_t, 4>
auto row = submdspan(matrix, 1, full_extent);

// row2 is layout_right and has extents<size_t, 4>
auto row2 = submdspan(matrix, integral_constant<size_t, 1>{}, full_extent);

// rows is layout_right and has extents<size_t, dynamic_extent, 4>
auto rows = submdspan(matrix, tuple{1, 3}, full_extent);

// rows2 is layout_right and has extents<size_t, 2, 4>
auto rows2 = submdspan(matrix, tuple{integral_constant<size_t, 1>{}, integral_constant<size_t, 3>{}}, full_extent);

// col is layout_stride and has extents<size_t, 4>.
auto col = submdspan(matrix, full_extent, 1);

// col2 is layout_stride and has extents<size_t, 4>.
// The strides can all be stored as compile-time constants.
auto col2 = submdspan(matrix, full_extent, integral_constant<size_t, 1>{});

// cols is layout_stride and has extents<size_t, 4, dynamic_extent>.
auto cols = submdspan(matrix, full_extent, tuple{1, 3});

// cols2 is layout_stride and has extents<size_t, 4, 2>.
// The strides can all be stored as compile-time constants.
auto cols2 = submdspan(matrix, full_extent, tuple{integral_constant<size_t, 1>{}, integral_constant<size_t, 3>{}});

@johan-overbye wrote:

However I'd still argue the that enabling a run-time offset and a compile-time extent would be very useful.

I'm not quite sure I understand what you mean. Could you give an example please?

If you pass in a single integer as a slice specifier, the corresponding extent of the input goes away; it does not appear in the extents of the return type of submdspan. It doesn't matter whether the integer is a run-time value or integral_constant.

The only way to change the extents of the return type of submdspan is to pass in a pair or tuple of two integers as a slice specifier. In that case, if you pass in a pair or tuple of integral_constant, the corresponding extent of the returned mdspan will be a compile-time extent (not dynamic_extent). If you pass in a pair or tuple of integers (like int or size_t, you'll get dynamic_extent. C++ doesn't have a way for a function to change its return type based on the value of a function parameter. This is because you can't declare a function parameter constexpr. (That's why we added this integral_constant feature.)

If you have a pair of constexpr integer values, you can put them into the extents type by hand, via assignment. Here is an example.

mdspan<float, extents<size_t, 4, 4>> matrix{ /* ... */ };

constexpr int b = 1;
constexpr int e = 3;

// rows is layout_right and has extents<size_t, dynamic_extent, 4>
auto rows = submdspan(matrix, tuple{b, e}, full_extent);

// Assigning run-time to compile-time extents is legal
// as long as the extents are equal.  Doing this generically
// requires knowing the correct layout and accessor. 
mdspan<float, extents<size_t, e - b>> rows3 = rows;
johan-overbye commented 2 years ago

@mhoemmen, thanks again for your time. These are super helpful examples.

Let's say I want a constexpr-sized 2x2 submatrix of mdspan<float, extents<size_t, 4, 4>> matrix{ /* ... */ };, starting at an arbitrary non-constexpr offset int i, j;. Meaning that matrix[i, j] corresponds to submatrix[0, 0].

Ideally submatrix's extents should be extents<size_t, 2, 2>, but I can't achieve that (directly) with the current submdspan as far as I can tell.

If you have a pair of constexpr integer values, you can put them into the extents type by hand, via assignment. Here is an example.

Great tip! This workaround definitely does the trick, but now that I've had my dream array/span system fall into my lap as if by magic, I've gotten greedy. 😄

Maybe it'd be easier to cater for what I'm suggesting if a slice were specified as start and extent rather than start and past-end.

mhoemmen commented 2 years ago

@johan-overbye wrote:

Let's say I want a constexpr-sized 2x2 submatrix of mdspan<float, extents<size_t, 4, 4>> matrix{ /* ... */ };, starting at an arbitrary non-constexpr offset int i, j;. Meaning that matrix[i, j] corresponds to submatrix[0, 0].

Thank you for explaining! You're right that this is not a feature of the current proposal. If you had constexpr i and j, you could do the following,

template<size_t Value>
using IC = std::integral_constant<size_T, Value>; // abbreviation

mdspan<float, extents<size_t, 4, 4>> matrix{ /* ... */ };
constexpr auto rows = std::tuple{IC<i>{}, IC<i+2>{}};
constexpr auto cols = std::tuple{IC<j>{}, IC<j+2>{}}; 
submatrix = submdspan(matrix, rows, cols);
static_assert(std::is_same_v<decltype(submatrix)::extents_type, extents<size_t, 2, 2>>);

but you don't have constexpr i and j, so this wouldn't work.

Maybe it'd be easier to cater for what I'm suggesting if a slice were specified as start and extent rather than start and past-end.

This is good thinking, but the current design is consistent with Fortran and Python, and is unlikely to change at this point. We could, however, extend submdspan in the future to accept different slice specifier types.

For very small matrices, if you want best performance, you might not want to create subviews of the matrix at all (which is what submdspan does). This is because mdspan always stores a pointer, and creating an mdspan with non-constexpr offsets requires pointer arithmetic. That's 64 bits of storage, and 64-bit arithmetic, compared with whatever tiny value type (e.g., int8_t) you were storing in the mdarray's std::array container. I wrote a bit about this in P1674.

Another option would be to do batched computations, so that you can amortize the pointer storage and arithmetic overhead over all the matrices in the batch.

Thanks again for your suggestions and explanations!

johan-overbye commented 2 years ago

This is good thinking, but the current design is consistent with Fortran and Python, and is unlikely to change at this point. We could, however, extend submdspan in the future to accept different slice specifier types.

Gotcha. I'd love such a feature personally, but the current submdspan does already cover about 90% of my use cases I think.

For very small matrices, if you want best performance, you might not want to create subviews of the matrix at all

Thanks for the heads-up! But it's not a big concern in my case as most of my uses for mdspan and mdarray are about optimising for conciseness and malleability rather than performance. If performance is crucial, SIMD or GPGPU will often be the way to go anyway.

Thanks a lot Mark, already super excited about this proposal. Swapped out my homebrew multi-dimensional array class for mdspan and mdarray in my maths library for a hobby project, and I'm loving it.

Just a couple of observations:

mhoemmen commented 2 years ago

@johan-overbye wrote:

Thanks a lot Mark, already super excited about this proposal. Swapped out my homebrew multi-dimensional array class for mdspan and mdarray in my maths library for a hobby project, and I'm loving it.

Thanks so much! : - ) We really appreciate your interest and early adoption!

Please note that the reference implementations of both mdspan and mdarray live in the https://github.com/kokkos/mdspan repository now. The old kokkos/mdarray repository is no longer supported. Also, both implementations have been changed recently to match the latest revisions of the proposals. I would recommend updating your versions if you haven't already. Thanks!

I can't think of a situation where I'd want std::vector over std::array as mdarray's container class when its extents are all static.

Thank you for your feedback! We've had a few LEWG discussions about the default container type. It's possible that this will change. There are a couple reasons why users might want to use vector even with all compile-time extents.

  1. Users might want the mdarray to use a custom allocator.
  2. The value type might have expensive move operations. (vector move is O(1); array move is O(size()).)

mdarrays with all static extents seem to require an extra 4 bytes on top of the container. (Latest MSVC for x86-64 with /std:c++latest.)

Implementations might be able to fix that with [[no_unique_address]]. The reference implementation of mdarray doesn't currently use that optimization:

https://github.com/kokkos/mdspan/blob/9ccf361e13dcc3bff2791fda327c2b10e081cb12/include/experimental/__p1684_bits/mdarray.hpp#L454

but the reference implementation of mdspan does:

I'll file an implementation issue; thank you for reporting this!

mdarray lacks a copy assignment operator. Intentional?

The latest revision (R3) of the proposal has one. You can find the latest released revision using the "wg21.link" link service: https://wg21.link/p1684.

johan-overbye commented 2 years ago

I would recommend updating your versions if you haven't already. Thanks!

Yep, all updated, thanks!

There are a couple reasons why users might want to use vector even with all compile-time extents.

Those make sense, thanks.

The reference implementation of mdarray doesn't currently use that optimization

OK, no problem! Just thought I'd flag it.

The latest revision (R3) of the proposal has one.

Brilliant, thank you.

Can't wait to see what's next. Thanks again for your time, and good luck!