kokkos / kokkos-kernels

Kokkos C++ Performance Portability Programming Ecosystem: Math Kernels - Provides BLAS, Sparse BLAS and Graph Kernels
Other
313 stars 98 forks source link

Implement batched serial laswp #2395

Closed yasahi-hpc closed 1 month ago

yasahi-hpc commented 1 month ago

This PR implements laswp function, which is needed for getrf PR.

Following files are added:

  1. KokkosBatched_Laswp_Serial_Impl.hpp: Internal interfaces with implementation details
  2. KokkosBatched_Laswp.hpp: APIs
  3. Test_Batched_SerialLaswp.hpp: Unit tests for that

Detailed description

This performs a series of row interchanges on a general rectangular matrix.

Parallelization would be made in the following manner. This is efficient only when A is given in LayoutLeft for GPUs and LayoutRight for CPUs (parallelized over batch direction).

Kokkos::parallel_for('laswp', 
    Kokkos::RangePolicy<execution_space> policy(0, n),
    [=](const int k) {
        auto sub_ipiv = Kokkos::subview(m_ipiv, k, Kokkos::ALL());
        auto sub_a = Kokkos::subview(m_a, k, Kokkos::ALL(), Kokkos::ALL());
        KokkosBatched::SerialLaswp<typename ParamTagType::direct>::invoke(sub_ipiv, sub_a);
    });

Tests

  1. Make a random matrix from random A and ipiv. Apply Laswp to A while preparing a reference Ref from A which is permuted by ipiv. Then confirm A == Ref.