kokkos / kokkos-kernels

Kokkos C++ Performance Portability Programming Ecosystem: Math Kernels - Provides BLAS, Sparse BLAS and Graph Kernels
Other
311 stars 98 forks source link

implement batched serial iamax #2399

Closed yasahi-hpc closed 3 weeks ago

yasahi-hpc commented 3 weeks ago

This PR implements iamax function, which is needed for getrf PR.

Following files are added:

  1. KokkosBatched_Iamax_Serial_Impl.hpp: Internal interfaces with implementation details
  2. KokkosBatched_Iamax.hpp: APIs
  3. Test_Batched_SerialIamax.hpp: Unit tests for that

Detailed description

This returns the index of the first element having maximum absolute value.

Parallelization would be made in the following manner. This is efficient only when A is given in LayoutLeft for GPUs and LayoutRight for CPUs (parallelized over batch direction).

Kokkos::parallel_for('iamax', 
    Kokkos::RangePolicy<execution_space> policy(0, n),
    [=](const int k) {
        auto sub_x = Kokkos::subview(m_x, k, Kokkos::ALL());
        auto iamax = KokkosBatched::SerialIamax::invoke(sub_x);
        m_r(k)     = iamax;
    });

Tests

  1. Analytical tests for length 3 vectors
    A0: [1, 2, 0] -> 1
    A1: [-5, 4, 3] -> 0
    A2: [0, 0, 0] -> 0
    A3: [0, -1, -1] -> 1
  2. Make a random vector from random X. Compare the return value with the index of the first element having maximum absolute value.
yasahi-hpc commented 3 weeks ago

Looks fine, the only small question would be: "Are we worried about views containing more than 2B elements with an int return type?"

Assuming that this is a KokkosBatched functionality, return value should be smaller than 2B elements for a practical use case. However, if you are thinking of security issue like https://github.com/kokkos/kokkos-kernels/pull/2397, it makes sense to use int64_t. If you prefer, I can fix it accordingly.

lucbv commented 3 weeks ago

I am not really worried was just thinking about it. One thing we could do is check the type that the view is using to store indices and use that. Then the problem becomes a Kokkos Core problem :p

yasahi-hpc commented 3 weeks ago

Sure. I will rely on the index_type of a View.