Open SimeonEhrig opened 1 year ago
Hi @SimeonEhrig ! Bryce also has presentations that talk about the two C++23 ranges features cartesian_product
and iota
. You can use those to construct a range with the desired iteration order, and then iterate over the range with an algorithm like ranges::for_each
.
You could also imagine specializing for the case of iterating over an mdspan's domain (which is a Cartesian product of the extents). It's an exercise (not an easy one!) in parameter pack algebra.
I developed a solution:
#include <experimental/mdspan>
#include <iostream>
namespace stdex = std::experimental;
/**
* @brief Construct submdspan of mdspan. The submdspan has one rank less than the mdspan. The left dimension is fixed
* to a specific index. The rest of the dimension contains the full range.
*
* @tparam TRank Dimension of the new submdspan (needs to be mdspan::rank()-1).
*/
template<int TRank>
struct Construct_Submdspan;
template<>
struct Construct_Submdspan<1>
{
template<typename TSpan, typename... Types>
constexpr auto construct(TSpan span, std::size_t const fixed_index_pos, Types... args)
{
return stdex::submdspan(span, fixed_index_pos, args...);
}
};
template<int TRank>
struct Construct_Submdspan
{
/**
* @brief Returns the submdspan of a mdspan, with one dimension less.
*
* @tparam TSpan Type of the span
* @tparam Types needs to std::experimental::full_extent_t
* @param span mdspan from which the submdspan is created
* @param fixed_index_pos Index postion of the fixed dimension
* @param args needs to std::experimental::full_extent
* @return constexpr auto returns a stdex::submdspan
*/
template<typename TSpan, typename... Types>
constexpr auto construct(TSpan span, std::size_t const fixed_index_pos, Types... args)
{
return Construct_Submdspan<TRank - 1>{}.construct(span, fixed_index_pos, stdex::full_extent, args...);
}
};
/**
* @brief Returns a submdspan of mdspan. The submdspan has one rank less than the mdspan. The left dimension is fixed
* to a specific index. The rest of the dimension contains the full range.
*
* @tparam TMDSpan
* @param span mdspan from which the submdspan is created
* @param fixed_index_pos Index postion of the fixed dimension
* @return constexpr auto returns a stdex::submdspan
*/
template<typename TMDSpan>
constexpr auto submdspan_remove_dim(TMDSpan span, std::size_t const fixed_index_pos)
{
constexpr auto rank = TMDSpan::rank();
return Construct_Submdspan<rank - 1>{}.construct(span, fixed_index_pos, stdex::full_extent);
}
/**
* @brief Iterates over all elements of an n dimension mdspan. The iteration order is from the right to the left
* dimension.
*
* @tparam TDim Rank of the mdspan
*/
template<int TDim>
struct Iterate_mdspan;
template<>
struct Iterate_mdspan<1>
{
template<typename TSpan, typename TFunc>
void operator()(TSpan span, TFunc& functor)
{
for(auto i = 0; i < span.extent(0); ++i)
{
span(i) = functor(span(i));
}
}
};
template<int TDim>
struct Iterate_mdspan
{
/**
* @brief Iterate over all elements of an mdspan and apply the functor on it.
*
* @tparam TSpan type of the mdspan
* @tparam TFunc type of the functor
* @param span The mdspan
* @param functor The functor
*/
template<typename TSpan, typename TFunc>
void operator()(TSpan span, TFunc& functor)
{
for(auto i = 0; i < span.extent(0); ++i)
{
auto submdspan = submdspan_remove_dim(span, i);
Iterate_mdspan<TSpan::rank() - 1>{}(submdspan, functor);
}
}
};
/**
* @brief Do the same like std::iota with a n-dimensional mdspan. The iteration order is from the right to the left
* dimension.
*
* @tparam TSpan type of the mdspan
* @tparam TData type of the functor
* @param span The mdspan
* @param index value of the first element
*/
template<typename TSpan, typename TData>
void iota_span(TSpan span, TData index)
{
static_assert(TSpan::rank() > 0);
auto functor = [&index](TData input) { return index++; };
Iterate_mdspan<TSpan::rank()>{}(span, functor);
}
int main()
{
std::array<int, 12> d;
// stdex::mdspan m{d.data(), stdex::extents{12}};
// stdex::mdspan m{d.data(), stdex::extents{2, 6}};
// stdex::mdspan m{d.data(), stdex::extents{2, 4, 2}};
stdex::mdspan m{d.data(), stdex::extents{2, 2, 1, 4}};
iota_span(m, 42);
for(auto const& v : d)
{
std::cout << v << " ";
}
std::cout << std::endl;
return 0;
}
I take some times, until I understand all required rules of variadic templates and variadic parameter packs and it takes also some time to combine everything but now I have a working solution. cppinsights helped a lot to developed it. Now, I need to check, if the compiler can optimize it.
Excellent work! C++23 (cartesian_product
and iota
range views) or the ranges-v3 library can make this a lot easier to write. Here is an example of C++20 code for iterating over the domain of an mdspan: https://godbolt.org/z/fh41E1q66
#include <array>
#include <iostream>
#include <ranges>
#include <range/v3/view/cartesian_product.hpp>
#include <range/v3/view/indices.hpp>
#include <tuple>
#include <https://raw.githubusercontent.com/kokkos/mdspan/single-header/mdspan.hpp>
namespace stdex = std::experimental;
// You can't use std::apply on templated nonmember functions,
// but you can use it on generic lambdas. See the example here:
//
// https://en.cppreference.com/w/cpp/utility/apply
auto output_one_tuple_2 = []<class ... InputTypes>(InputTypes&& ... input) {
// lambdas with explicit template parameters require C++20.
auto print_all = [&]<std::size_t ... Indices>( std::index_sequence<Indices...> ) {
auto print_one = [&] (std::size_t index, auto&& in) {
std::cout << in;
if(index + 1 < sizeof...(Indices)) {
std::cout << ", ";
}
};
(print_one(Indices, input), ...);
};
std::cout << '(';
print_all(std::make_index_sequence<sizeof...(InputTypes)>());
std::cout << ")\n";
};
template<class Callable, class IndexType, std::size_t ... Extents>
void for_each_in_extents(Callable&& f, stdex::extents<IndexType, Extents...> e) {
[&]<std::size_t ... Indices>( std::index_sequence<Indices...> ) {
auto v = ranges::views::cartesian_product(ranges::views::indices(0, e.extent(Indices))...);
for(const auto& k : v) {
std::apply(std::forward<Callable>(f), k);
}
}(std::make_index_sequence<sizeof...(Extents)>());
}
int main() {
stdex::extents<int, 3, 4, 5> e;
for_each_in_extents(output_one_tuple_2, e);
return 0;
}
Here's an improvement of that example, that shows how to iterate over the domain of an mdspan without needing ranges: https://godbolt.org/z/nabce8WW6
#include <https://raw.githubusercontent.com/kokkos/mdspan/single-header/mdspan.hpp>
#include <array>
#include <iostream>
#include <tuple>
namespace stdex = std::experimental;
// You can't use std::apply on templated nonmember functions,
// but you can use it on generic lambdas. See the example here:
//
// https://en.cppreference.com/w/cpp/utility/apply
auto print_pack = []<class ... InputTypes>(InputTypes&& ... input) {
auto print_all = [&]<std::size_t ... Indices>( std::index_sequence<Indices...> ) {
auto print_one = [&] (std::size_t index, auto&& in) {
std::cout << in;
if(index + 1 < sizeof...(Indices)) {
std::cout << ", ";
}
};
(print_one(Indices, input), ...);
};
std::cout << '(';
print_all(std::make_index_sequence<sizeof...(InputTypes)>());
std::cout << ")\n";
};
// C++20 lets you write lambdas templated on <std::size_t ... Indices>.
// Calling those lambdas with the result of std::make_index_sequence
// is a good way to "iterate" over a parameter pack.
// If you don't have C++20, you can replace the lambda
// with a separate, named helper function.
// Returns a new extents object representing all but the leftmost extent of e.
template<class IndexType, std::size_t ... Extents>
auto right_extents( stdex::extents<IndexType, Extents...> e )
{
static_assert(sizeof...(Extents) != 0);
return [&]<std::size_t ... Indices>( std::index_sequence<Indices...> ) {
return stdex::extents<IndexType, e.static_extent(Indices + 1)...>{
e.extent(Indices + 1)...
};
}( std::make_index_sequence<sizeof...(Extents) - 1>() );
}
// right_extents can be implemented by overloading for
// extents<IndexType, LeftExtent, RightExtents...>.
// That approach doesn't work for left_extents.
// Returns a new extents object representing all but the rightmost extent of e.
template<class IndexType, std::size_t ... Extents>
auto left_extents( stdex::extents<IndexType, Extents...> e )
{
static_assert(sizeof...(Extents) != 0);
return [&]<std::size_t ... Indices>( std::index_sequence<Indices...> ) {
return stdex::extents<IndexType, e.static_extent(Indices)...>{
e.extent(Indices)...
};
}( std::make_index_sequence<sizeof...(Extents) - 1>() );
}
template<class Callable, class IndexType, std::size_t ... Extents>
void for_each_in_extents_row_major(Callable&& callable, stdex::extents<IndexType, Extents...> ext)
{
if constexpr(ext.rank() == 0) {
return;
} else if constexpr(ext.rank() == 1) {
if constexpr(ext.static_extent(0) == stdex::dynamic_extent) {
for(IndexType index = 0; index < ext.extent(0); ++index) {
std::forward<Callable>(callable)(index);
}
} else {
// TODO unroll this loop at compile time
// (the compiler might do that for us anyway).
for(IndexType index = 0; index < ext.static_extent(0); ++index) {
std::forward<Callable>(callable)(index);
}
}
} else {
auto right_ext = right_extents(ext);
if constexpr(ext.static_extent(0) == stdex::dynamic_extent) {
for(IndexType left_index = 0; left_index < ext.extent(0); ++left_index) {
auto next = [&] (auto... right_indices) {
std::forward<Callable>(callable)(left_index, right_indices...);
};
for_each_in_extents_row_major( next, right_ext );
}
} else {
// TODO unroll this loop at compile time
// (the compiler might do that for us anyway).
for(IndexType left_index = 0; left_index < ext.static_extent(0); ++left_index) {
auto next = [&] (auto... right_indices) {
std::forward<Callable>(callable)(left_index, right_indices...);
};
for_each_in_extents_row_major( next, right_ext );
}
}
}
}
// Overloading on stdex::extents<IndexType, LeftExtents..., RightExtent>
// works fine for the row major case, but not for the column major case.
template<class Callable, class IndexType, std::size_t ... Extents>
void for_each_in_extents_col_major(Callable&& callable, stdex::extents<IndexType, Extents...> ext)
{
if constexpr(ext.rank() == 0) {
return;
} else if constexpr (ext.rank() == 1) {
if constexpr(ext.static_extent(ext.rank() - 1) == stdex::dynamic_extent) {
for(IndexType index = 0; index < ext.extent(ext.rank() - 1); ++index) {
std::forward<Callable>(callable)(index);
}
} else {
for(IndexType index = 0; index < ext.static_extent(ext.rank() - 1); ++index) {
std::forward<Callable>(callable)(index);
}
}
} else {
auto left_ext = left_extents(ext);
if constexpr(ext.static_extent(ext.rank() - 1) == stdex::dynamic_extent) {
for(IndexType right_index = 0; right_index < ext.extent(ext.rank() - 1); ++right_index) {
auto next = [&] (auto... left_indices) {
std::forward<Callable>(callable)(left_indices..., right_index);
};
for_each_in_extents_col_major( next, left_ext );
}
} else {
for(IndexType right_index = 0; right_index < ext.static_extent(ext.rank() - 1); ++right_index) {
auto next = [&] (auto... left_indices) {
std::forward<Callable>(callable)(left_indices..., right_index);
};
for_each_in_extents_col_major( next, left_ext );
}
}
}
}
int main() {
stdex::extents<int, 3, 4, 5> e;
std::cout << "Row major:\n";
for_each_in_extents_row_major(print_pack, e);
std::cout << "\nColumn major:\n";
for_each_in_extents_col_major(print_pack, e);
return 0;
}
Since I had to implement the same functionality in LLAMA, here is my version:
// ... includes etc.
template<typename IndexType, std::size_t ... Extents, typename Func,
std::size_t FirstExtInd, std::size_t... RestExtInds, typename... Is>
void forEachIndex(const stdex::extents<IndexType, Extents...>& ext, Func&& func,
std::index_sequence<FirstExtInd, RestExtInds...>, Is... is) {
for(IndexType i = 0; i < ext.extent(FirstExtInd); i++)
if constexpr(sizeof...(RestExtInds) > 0)
forEachIndex(ext, std::forward<Func>(func), std::index_sequence<RestExtInds...>{}, is..., i);
else
std::forward<Func>(func)(is..., i);
}
template<class IndexType, std::size_t ... Extents, typename Func>
void forEachIndex(stdex::extents<IndexType, Extents...> ext, Func&& func) {
// for column-major, you could reverse the index_sequence
forEachIndex(ext, std::forward<Func>(func), std::make_index_sequence<sizeof...(Extents)>{});
}
int main() {
stdex::extents<int, 3, stdex::dynamic_extent, 5> e{4};
forEachIndex(e, print_pack);
return 0;
}
Godbolt: https://godbolt.org/z/6ae585YM8
The trick here is to use an index_sequence
to determine in which order the dimensions of the extents
should be iterated. That idea btw. came from alpaka's NdLoop
. The extents
object is also never modified, so you can safe yourself the utility function to pop one extent left or right.
@SimeonEhrig here is your example of iota_span
using the above snippet:
template<typename TSpan, typename TData>
void iota_span(TSpan span, TData index)
{
forEachIndex(span.extents(), [&](auto... is){
span(is...) = index++;
});
}
Godbolt: https://godbolt.org/z/5hncdK1fj
@mhoemmen and @bernhardmgruber thanks for the solutions. Looks much better, than my solution. I think, I will take @bernhardmgruber solution and extend it with the ideas of @mhoemmen solution (I cannot directly overtake it, because we cannot use C++20 because of the nvcc compiler).
@SimeonEhrig I think I can do better to make my code more concise! I've been working on it when I need a break from other things. The only strictly C++20 things in my example are lambdas with named template parameters, but you've figured out how to work around.
You might also want to consider Kokkos' MDRange, an existing optimized parallel multidimensional index for-each that only requires C++14 and works well with nvcc and other compilers.
@bernhardmgruber @SimeonEhrig Here's a refinement of my above example: https://godbolt.org/z/o9ecK7coj -- please see in particular for_each_in_extents_4
.
Thanks for the solution. Looks like my idea in a heavy optimized version :sweat_smile: I'm also impressed, how you can use templated lambdas in C++20 for template meta programming. I read about templated lambdas but I never thought about to use it for meta programming.
I'm also impressed, how you can use templated lambdas in C++20 for template meta programming. I read about templated lambdas but I never thought about to use it for meta programming.
TMP is the use for lambdas with explicit template heads until we get P1061, possibly in C++26.
Edit: given that std::extents
can be destructured, which it can be, but not in the way you would expect:
stdex::extents<int, 3, stdex::dynamic_extent, 5> e{4};
const auto [a, b, c] = e; // compile error
const auto [b] = e; // ok, but b is now some kind of internal `storage_t`.
@mhoemmen should this be fixed? That is, should std::extents
destructure to all extents (with static extends being demoted to runtime values)? I would find this useful. Definitely more useful than what currently happens :)
This structured binding thing will not work no matter what, certainly wouldn't produce runtime values. This just makes internal members visible.
Oh I see you could do it for custom types. I guess someone could propose a paper to do this ...
@crtrott Right, one could overload the various tuple_*
customization points so that get
works as one might expect.
@bernhardmgruber It's a neat idea to make this work for extents
. Would you be interested in writing the paper? It would have to target C++26, as the C++23 acceptance date for new features is long past.
@crtrott and @mhoemmen Exactly, we would need specializations of std::tuple_size
and std::tuple_element
as well as either a member std::extents::get
or a free std::get(extents)
.
I am interested in writing a small paper. It's a little personal wish of mine for a long time anyway :) I will try to float a draft on the mailing list in a couple of weeks.
@bernhardmgruber That's excellent! We would be happy to review early drafts of the paper before you send it to the mailing list. Thanks!
Hello, thanks for the nice mdspan
!
How bad is the following code? I expect much less index computation but ...
template <class T, class IndexType, size_t E0, size_t... E, class L, class A, class Fun>
void forEach(std::mdspan<T, std::extents<IndexType, E0, E...>, L, A> x, Fun&& f)
{
auto full = [](size_t) { return std::full_extent; };
for (IndexType i = 0; i < x.extent(0); i++)
if constexpr (x.rank() > 1)
forEach(stdex::submdspan(x, i, full(E)...), f);
else
std::forward<Fun>(f)(x[i]);
}
Hi @yurielnf , my personal experience is, that constructing recursive std::submdspan
is very expensive. You should avoid it. Instead you should direct built the memory access recursive: forEach(mdspan[calculate_extend_index<I>(index)...], f);
Thanks @SimeonEhrig. This implementation works for strided mdspan
. Any idea how to generalize it?
template <size_t pos, class T, class E, class L, class A, class Fun>
void forEach_impl(std::mdspan<T, E, L, A> const& x, Fun&& f, size_t offset) {
if constexpr (pos == x.rank())
std::forward<Fun>(f)(x.accessor().access(x.data_handle(), offset));
else
for (size_t i = 0u; i < x.extent(pos); i++)
forEach_impl<pos + 1>(x, std::forward<Fun>(f), offset + i * x.stride(pos));
}
template <class T, class E, class L, class A, class Fun>
void forEach(std::mdspan<T, E, L, A> const& x, Fun&& f) {
static_assert(x.is_always_strided(), "valid only for strided mdspan");
forEach_impl<0>(x, std::forward<Fun>(f), 0);
}
Take this solution from this issue, pass a std::mdspan
instead the extend and modify the std::forward
.
https://github.com/kokkos/mdspan/issues/202#issuecomment-1294207397
@yurielnf Please feel free to use this PR as a guide: https://github.com/kokkos/mdspan/pull/247
Excellent! It is a beautiful code!
In the case of strided mdspan
, would the recursion on your lambda next
be as efficient as the recursion on a size_t offset
above (after unrolling the loop for compile-time extents)?
@yurielnf wrote:
Excellent! It is a beautiful code!
Thanks so much! : - D
would the recursion on your lambda next be as efficient as the recursion on a
size_t offset
above (after unrolling the loop for compile-time extents)?
My implementation is generic; it works for any layout (if reading, or for any unique layout if writing). If you want iteration to be efficient, the best way is to specialize for the layout.
For exhaustive layouts, just skip the layout mapping and iterate directly over [0, required_span_size()
).
For strided layouts where either the leftmost or rightmost extent has stride 1 (see P2642), you might get faster code by treating the stride-1 extent as a special case.
Another generic approach is to use the inverse of a canonical exhaustive layout. For example, take the layout_left::mapping
with the same extents and iterate over the valid offsets [0, required_span_size()
) of that mapping. In the iteration, invert the current offset to find the canonical rank()
-dimensional coordinate, then use that coordinate in the original mdspan's mapping. This approach requires divmod (divisions and modulus) operations, so it's probably not as fast for strided layouts as what you're doing.
Thanks for your interest in mdspan! : - )
Hi, I'm working on vikunja, a platform-independent primitives library (e.g.
vikunja::transform
,vikunja::reduce
) for different accelerators based on alpaka. I want to usemdspan
to support N dimension memory. In the presentation of Bryce Aldelstein Lelbach (youtube link) there is the idea to build a recursive function to create n nested loops, which the compiler can optimize.I implemented a prototype with
submdspan
for an iota function for amdspan
:My problem is, that I need to specialize each dimension by hand. At the moment I stopped at dim 3. Does anybody have an idea to write a generic function to iterate over all elements of a
mdspan
?Maybe it is possible to set the n
stdex::full_extent
arguments in the functionstdex::submdspan(span, i, stdex::full_extent, ...)
depending on the template parameterTDim
. I'm not sure, if this is possible with some variadic or type trait.