kokkos / mdspan

Reference implementation of mdspan targeting C++23
Other
417 stars 69 forks source link

Add layout_padded example #180

Closed mhoemmen closed 1 year ago

mhoemmen commented 2 years ago

I added a layout_padded example. This is a strided layout that ensures a given overalignment of the stride-1 extent (either the leftmost, for column-major mode, or the rightmost, for row-major mode).

I've tested this on a few different compilers, as one can see here: https://godbolt.org/z/3Kn833dj8

mhoemmen commented 2 years ago

@crtrott wrote:

You are storing two extents objects here, even though only a single dimension is padded. I think we can simplify this.

An excellent point -- I had originally thought that we wanted to pad all but a single dimension; that would have justified the current design more in terms of code reuse. Padding just a single dimension suggests a different way to compute strides.

Also I don't particular like the storage order part. For example: why not simply take another layout policy as an argument, if we anyway store an inner layout? ... That is assuming that simply introducing layout_left_padded and layout_right_padded is not the right thing to do in the first place.

We had a nice chat about this offline -- thanks!

mhoemmen commented 2 years ago

Here is a summary of suggested revisions from this morning's offline discussion with Christian.

layout_padded and the "BLAS general layouts" in P1673 have the same mathematical properties.

  1. They all add padding -- extra, unviewed, but valid elements -- to one extent. This effectively increases one stride -- like layout_stride, but with one of the extents fixed to unit stride at compile time.
  2. The difference between these layouts and layout_stride is that the padding elements are "valid," that is, accessible. This means we can use them in optimizations -- e.g., to make copies contiguous. (This implies, in turn, that the only valid "nested layouts" for layout_padded are layout_left or layout_right. It doesn't make sense to template on a general "inner layout.")
  3. For the rank-2 case, all these layouts are invariant under any subview of contiguous rows and or columns. For the rank > 2 case, this still holds if we store (rank - 1) strides.

Here are the only differences between the current layout_padded design and P1673's BLAS general layouts.

  1. P1673 currently limits its BLAS general layouts to have rank 2. However, generalization is possible (see (3) above). 2 layout_padded expresses padding via "overalignment factor." This is really just an interface convenience for use with aligned_accessor to make an aligned mdspan. Once layout_padded computes the padded extents, it no longer needs the overalignment factor. (We could create an alias and/or function to express a mapping from overalignment factor to padded layout.)

This suggests that we only need two new layouts to express both layout_padded and P1673's BLAS general layouts: layout_left_padded, and layout_right_padded. We don't use an enum to distinguish the two cases, for the same reason that layout_left and layout_right are separate types and not one type with an enum non-type template parameter.

As a size optimization, we only need to store the padded extents and the one input extent that was padded (and thus differs from the corresponding padded extent). We could use extents<index_type, InputExtent> (or its underlying "partially static array" representation) to represent the one input extent as either a compile-time or a run-time value.

It would make sense for submdspan of a layout_(left,right) mdspan, with the appropriate slices, to produce a layout_(left,right)_padded mdspan.

mhoemmen commented 2 years ago

I've converted this to a draft, so I can finish implementing the new layout_{left,right}_padded design that Christian and I discussed a few days ago. This will make it possible to unify the P1673 "BLAS general layouts" with padded layouts.

mhoemmen commented 2 years ago

@crtrott @dalg24 @nliber This PR is ready for re-review; thanks!

mhoemmen commented 1 year ago

I've handed off this work to a collaborator for now.

mhoemmen commented 1 year ago

This PR is superseded by PR #237. Thanks all!