This PR implements laswp function, which is needed for getrf PR.
Following files are added:
KokkosBatched_Laswp_Serial_Impl.hpp: Internal interfaces with implementation details
KokkosBatched_Laswp.hpp: APIs
Test_Batched_SerialLaswp.hpp: Unit tests for that
Detailed description
This performs a series of row interchanges on a general rectangular matrix.
A: (batch_count, m, n) or (batch_count, m)
On entry, the M-by-N matrix or the length M vector. The row interchanges will be applied to the matrix of column dimension N. On exit, the permuted matrix or vector.
IPIV: (batch_count, m)
The pivot indices; for 0 <= i < m, row i of the matrix was interchanged with row IPIV(i).
Parallelization would be made in the following manner. This is efficient only when
A is given in LayoutLeft for GPUs and LayoutRight for CPUs (parallelized over batch direction).
Kokkos::parallel_for('laswp',
Kokkos::RangePolicy<execution_space> policy(0, n),
[=](const int k) {
auto sub_ipiv = Kokkos::subview(m_ipiv, k, Kokkos::ALL());
auto sub_a = Kokkos::subview(m_a, k, Kokkos::ALL(), Kokkos::ALL());
KokkosBatched::SerialLaswp<typename ParamTagType::direct>::invoke(sub_ipiv, sub_a);
});
Tests
Make a random matrix from random A and ipiv. Apply Laswp to A while preparing a reference Ref from A which is permuted by ipiv. Then confirm A == Ref.
This PR implements laswp function, which is needed for getrf PR.
Following files are added:
KokkosBatched_Laswp_Serial_Impl.hpp
: Internal interfaces with implementation detailsKokkosBatched_Laswp.hpp
: APIsTest_Batched_SerialLaswp.hpp
: Unit tests for thatDetailed description
This performs a series of row interchanges on a general rectangular matrix.
A
:(batch_count, m, n)
or(batch_count, m)
On entry, the M-by-N matrix or the length M vector. The row interchanges will be applied to the matrix of column dimension N. On exit, the permuted matrix or vector.IPIV
:(batch_count, m)
The pivot indices; for0 <= i < m
, rowi
of the matrix was interchanged with rowIPIV(i)
.Parallelization would be made in the following manner. This is efficient only when A is given in
LayoutLeft
for GPUs andLayoutRight
for CPUs (parallelized over batch direction).Tests
A
andipiv
. ApplyLaswp
toA
while preparing a referenceRef
fromA
which is permuted byipiv
. Then confirmA
==Ref
.