Open cwsmith opened 2 months ago
Thanks for reporting, we will have a look at this
Hi @lucbv . Thanks for looking into this.
It seems like the problem is related to handling rectangular matrices. If the MatrixViewType
is defined to be 10x10:
typedef Kokkos::View<double[16][10]> MatrixViewType;
there are no errors under valgrind.
Digging into the code a bit, but without a full understanding of it, I see that this loop over matrix rows:
that successively removes one row and one column to form the 3x3 partitioned matrix A_part3x3
via the call A_part3x3.partWithABR(A_part2x2, 1, 1);
.
In the original case of the 16x10 matrix, running valgrind with the gdbserver I see that the first invalid read occurs in SerialLeftHouseholderInternal::invoke(...)
when m_atl=10
in SerialQR_Internal::invoke(...)
. This seems to make sense as all 10 columns have been removed.
The test case in the repo for QR appears to only run with square matrices:
test without column pivoting: https://github.com/kokkos/kokkos-kernels/blob/2c4dd7e2bd3f19344ab6aa6e851d8adf1ffd3897/batched/dense/unit_test/Test_Batched_TeamVectorQR.hpp#L111
test 'WithColumnPivoting': https://github.com/kokkos/kokkos-kernels/blob/2c4dd7e2bd3f19344ab6aa6e851d8adf1ffd3897/batched/dense/unit_test/Test_Batched_TeamVectorQR_WithColumnPivoting.hpp#L121
Okay, thanks for digging a bit into this, I will run the code in valgrind / gdb as well and hopefully can reproduce and report my observation. The algorithm indeed uses a partitioning in the matrix to perform some operations but it should still work for rectangular matrices. Once I find something promising I will let you know about it : )
The PR above, #2342, has a fix for the rectangular matrices and introduces more tests for the Serial QR feature. The tests are not fully implemented yet but the fix seems to be okay if you want to give it a try.
This is great. Thank you @lucbv.
Using the PR branch (https://github.com/kokkos/kokkos-kernels/pull/2342/commits/9121f0a5bc697cfa55a68998e037a895a460677c) I ran the reproducer under valgrind again and the SerialQR
errors are gone.
Running the expanded version of the reproducer:
https://github.com/SCOREC/meshFields/blob/20a68919b338003ff8792ce7d2cc6c5df3f13613/test/testQR.cpp
under valgrind reports an invalid read in the call to SerialTrsv
. Some additional details on the first invalid read and the valgrind log are below.
Running under gdb reports the following values of variables at the point of the first reported invalid read.
Reading symbols from /opt/scorec/spack/rhel9/v0201_4/install/linux-rhel9-x86_64/gcc-12.3.0/libiconv-1.17-oylnknwv5m57zkfjde7op6ne3pqdkhxg/lib/libiconv.so.2...
0x0000000000427b8f in KokkosBatched::SerialTrsvInternalUpper<KokkosBlas::Algo::Level2::Unblocked>::invoke<double, double> (use_unit_diag=false, m=16, alpha=1, A=0x8991f00, as0=1, as1=16, b=0x8993940, bs0=1)
at /space/cwsmith/meshFields/buildKokkosKernelsSerial/install/include/KokkosBatched_Trsv_Serial_Internal.hpp:161
161 if (!use_unit_diag) *beta1 = *beta1 / A[p * as0 + p * as1];
(ins)(gdb) where
#0 0x0000000000427b8f in KokkosBatched::SerialTrsvInternalUpper<KokkosBlas::Algo::Level2::Unblocked>::invoke<double, double> (use_unit_diag=false, m=16, alpha=1, A=0x8991f00, as0=1, as1=16, b=0x8993940, bs0=1)
at /space/cwsmith/meshFields/buildKokkosKernelsSerial/install/include/KokkosBatched_Trsv_Serial_Internal.hpp:161
#1 0x0000000000423408 in KokkosBatched::SerialTrsv<KokkosBatched::Uplo::Upper, KokkosBlas::Trans::NoTranspose, KokkosBatched::Diag::NonUnit, KokkosBlas::Algo::Level2::Unblocked>::invoke<double, Kokkos::View<double [16][10], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> >, Kokkos::View<double [16], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> > > (alpha=1, A=..., b=...)
at /space/cwsmith/meshFields/buildKokkosKernelsSerial/install/include/KokkosBatched_Trsv_Serial_Impl.hpp:191
#2 0x000000000041a7c8 in operator() (__closure=0x1ffefef0d0)
at /space/cwsmith/meshFields/meshFields/test/testQR.cpp:127
#3 0x000000000041bf36 in Kokkos::Impl::ParallelFor<testSolveQR()::<lambda(int)>, Kokkos::RangePolicy<Kokkos::Serial>, Kokkos::Serial>::exec<void>(void) const (this=0x1ffefef0d0)
at /space/cwsmith/meshFields/buildKokkosSerial/install/include/Serial/Kokkos_Serial_Parallel_Range.hpp:37
#4 0x000000000041bb86 in Kokkos::Impl::ParallelFor<testSolveQR()::<lambda(int)>, Kokkos::RangePolicy<Kokkos::Serial>, Kokkos::Serial>::execute(void) const (this=0x1ffefef0d0)
at /space/cwsmith/meshFields/buildKokkosSerial/install/include/Serial/Kokkos_Serial_Parallel_Range.hpp:56
#5 0x000000000041b71e in Kokkos::parallel_for<Kokkos::RangePolicy<Kokkos::Serial>, testSolveQR()::<lambda(int)> >(const std::string &, const Kokkos::RangePolicy<Kokkos::Serial> &, const struct {...} &) (str=..., policy=..., functor=...)
at /space/cwsmith/meshFields/buildKokkosSerial/install/include/Kokkos_Parallel.hpp:146
#6 0x000000000041b4c3 in Kokkos::parallel_for<testSolveQR()::<lambda(int)> >(const std::string &, size_t, const struct {...} &) (str=..., work_count=1, functor=...)
at /space/cwsmith/meshFields/buildKokkosSerial/install/include/Kokkos_Parallel.hpp:167
#7 0x000000000041afbc in testSolveQR () at /space/cwsmith/meshFields/meshFields/test/testQR.cpp:113
#8 0x000000000041b2ed in main (argc=1, argv=0x1ffefef608) at /space/cwsmith/meshFields/meshFields/test/testQR.cpp:140
(ins)(gdb) p p
$1 = 15
(ins)(gdb) p m
$2 = 16
Given this loop from p=m-1:0
https://github.com/kokkos/kokkos-kernels/blob/b2210058826672c8de838541a36f7b946ecbb79a/batched/dense/impl/KokkosBatched_Trsv_Serial_Internal.hpp#L152 and the use of p
twice to compute the index into A
here (where the invalid read occurs): https://github.com/kokkos/kokkos-kernels/blob/b2210058826672c8de838541a36f7b946ecbb79a/batched/dense/impl/KokkosBatched_Trsv_Serial_Internal.hpp#L161 , it looks that the assumption that A is square is made here as well.
==265289== Memcheck, a memory error detector
==265289== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==265289== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
==265289== Command: ./QRTests
==265289==
==265289== Invalid read of size 8
==265289== at 0x427B8F: int KokkosBatched::SerialTrsvInternalUpper<KokkosBlas::Algo::Level2::Unblocked>::invoke<double, double>(bool, int, double, double const*, int, int, double*, int) (KokkosBatched_Trsv_Serial_Internal.hpp:161)
==265289== by 0x423407: int KokkosBatched::SerialTrsv<KokkosBatched::Uplo::Upper, KokkosBlas::Trans::NoTranspose, KokkosBatched::Diag::NonUnit, KokkosBlas::Algo::Level2::Unblocked>::invoke<double, Kokkos::View<double [16][10], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> >, Kokkos::View<double [16], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> > >(double, Kokkos::View<double [16][10], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> > const&, Kokkos::View<double [16], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> > const&) (KokkosBatched_Trsv_Serial_Impl.hpp:191)
==265289== by 0x41A7C7: testSolveQR()::{lambda(int)#1}::operator()(int) const (testQR.cpp:127)
==265289== by 0x41BF35: std::enable_if<is_void_v<void>, void>::type Kokkos::Impl::ParallelFor<testSolveQR()::{lambda(int)#1}, Kokkos::RangePolicy<Kokkos::Serial>, Kokkos::Serial>::exec<void>() const (Kokkos_Serial_Parallel_Range.hpp:37)
==265289== by 0x41BB85: Kokkos::Impl::ParallelFor<testSolveQR()::{lambda(int)#1}, Kokkos::RangePolicy<Kokkos::Serial>, Kokkos::Serial>::execute() const (Kokkos_Serial_Parallel_Range.hpp:56)
==265289== by 0x41B71D: void Kokkos::parallel_for<Kokkos::RangePolicy<Kokkos::Serial>, testSolveQR()::{lambda(int)#1}, void>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, Kokkos::RangePolicy<Kokkos::Serial> const&, testSolveQR()::{lambda(int)#1} const&) (Kokkos_Parallel.hpp:146)
==265289== by 0x41B4C2: void Kokkos::parallel_for<testSolveQR()::{lambda(int)#1}>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long, testSolveQR()::{lambda(int)#1} const&) (Kokkos_Parallel.hpp:167)
==265289== by 0x41AFBB: testSolveQR() (testQR.cpp:113)
==265289== by 0x41B2EC: main (testQR.cpp:140)
==265289== Address 0x89926f8 is 16 bytes after a block of size 40 free'd
==265289== at 0x484893D: operator delete(void*, unsigned long) (vg_replace_malloc.c:1101)
==265289== by 0x423D0C: Kokkos::Impl::HostSharedPtr<Kokkos::Impl::SerialInternal>::cleanup() (Kokkos_HostSharedPtr.hpp:120)
==265289== by 0x41DDA1: Kokkos::Impl::HostSharedPtr<Kokkos::Impl::SerialInternal>::~HostSharedPtr() (Kokkos_HostSharedPtr.hpp:92)
==265289== by 0x41CC4F: Kokkos::Serial::~Serial() (Kokkos_Serial.hpp:95)
==265289== by 0x423E87: Kokkos::Impl::ViewCtorProp<void, Kokkos::Serial>::~ViewCtorProp() (Kokkos_ViewCtor.hpp:122)
==265289== by 0x423EA7: Kokkos::Impl::ViewCtorProp<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::HostSpace, Kokkos::Serial>::~ViewCtorProp() (Kokkos_ViewCtor.hpp:182)
==265289== by 0x425342: Kokkos::View<double [10], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> >::View<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(Kokkos::Impl::ViewCtorProp<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const&, std::enable_if<!Kokkos::Impl::ViewCtorProp<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >::has_pointer, Kokkos::LayoutLeft>::type const&) (Kokkos_ViewLegacy.hpp:1048)
==265289== by 0x41F9BC: Kokkos::View<double [10], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> >::View<char [3]>(char const (&) [3], std::enable_if<Kokkos::Impl::is_view_label<char [3]>::value, unsigned long const>::type, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) (Kokkos_ViewLegacy.hpp:1158)
==265289== by 0x41AC8F: testSolveQR() (testQR.cpp:89)
==265289== by 0x41B2EC: main (testQR.cpp:140)
==265289== Block was alloc'd at
==265289== at 0x4844F95: operator new(unsigned long) (vg_replace_malloc.c:483)
==265289== by 0x5C94902: HostSharedPtr<Kokkos::Serial::Serial()::<lambda(Kokkos::Impl::SerialInternal*)> > (Kokkos_HostSharedPtr.hpp:47)
==265289== by 0x5C94902: Kokkos::Serial::Serial() (Kokkos_Serial.cpp:160)
==265289== by 0x4251A9: Kokkos::View<double [10], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> >::View<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(Kokkos::Impl::ViewCtorProp<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const&, std::enable_if<!Kokkos::Impl::ViewCtorProp<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >::has_pointer, Kokkos::LayoutLeft>::type const&) (Kokkos_ViewLegacy.hpp:1002)
==265289== by 0x41F9BC: Kokkos::View<double [10], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> >::View<char [3]>(char const (&) [3], std::enable_if<Kokkos::Impl::is_view_label<char [3]>::value, unsigned long const>::type, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) (Kokkos_ViewLegacy.hpp:1158)
==265289== by 0x41AC8F: testSolveQR() (testQR.cpp:89)
==265289== by 0x41B2EC: main (testQR.cpp:140)
==265289==
==265289== Invalid read of size 8
==265289== at 0x427BDC: int KokkosBatched::SerialTrsvInternalUpper<KokkosBlas::Algo::Level2::Unblocked>::invoke<double, double>(bool, int, double, double const*, int, int, double*, int) (KokkosBatched_Trsv_Serial_Internal.hpp:163)
==265289== by 0x423407: int KokkosBatched::SerialTrsv<KokkosBatched::Uplo::Upper, KokkosBlas::Trans::NoTranspose, KokkosBatched::Diag::NonUnit, KokkosBlas::Algo::Level2::Unblocked>::invoke<double, Kokkos::View<double [16][10], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> >, Kokkos::View<double [16], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> > >(double, Kokkos::View<double [16][10], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> > const&, Kokkos::View<double [16], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> > const&) (KokkosBatched_Trsv_Serial_Impl.hpp:191)
==265289== by 0x41A7C7: testSolveQR()::{lambda(int)#1}::operator()(int) const (testQR.cpp:127)
==265289== by 0x41BF35: std::enable_if<is_void_v<void>, void>::type Kokkos::Impl::ParallelFor<testSolveQR()::{lambda(int)#1}, Kokkos::RangePolicy<Kokkos::Serial>, Kokkos::Serial>::exec<void>() const (Kokkos_Serial_Parallel_Range.hpp:37)
==265289== by 0x41BB85: Kokkos::Impl::ParallelFor<testSolveQR()::{lambda(int)#1}, Kokkos::RangePolicy<Kokkos::Serial>, Kokkos::Serial>::execute() const (Kokkos_Serial_Parallel_Range.hpp:56)
==265289== by 0x41B71D: void Kokkos::parallel_for<Kokkos::RangePolicy<Kokkos::Serial>, testSolveQR()::{lambda(int)#1}, void>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, Kokkos::RangePolicy<Kokkos::Serial> const&, testSolveQR()::{lambda(int)#1} const&) (Kokkos_Parallel.hpp:146)
==265289== by 0x41B4C2: void Kokkos::parallel_for<testSolveQR()::{lambda(int)#1}>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long, testSolveQR()::{lambda(int)#1} const&) (Kokkos_Parallel.hpp:167)
==265289== by 0x41AFBB: testSolveQR() (testQR.cpp:113)
==265289== by 0x41B2EC: main (testQR.cpp:140)
==265289== Address 0x8992680 is 6 bytes after a block of size 74 free'd
==265289== at 0x484893D: operator delete(void*, unsigned long) (vg_replace_malloc.c:1101)
==265289== by 0x5C94C5D: deallocate (new_allocator.h:158)
==265289== by 0x5C94C5D: deallocate (alloc_traits.h:496)
==265289== by 0x5C94C5D: _M_destroy (basic_string.h:300)
==265289== by 0x5C94C5D: _M_dispose (basic_string.h:294)
==265289== by 0x5C94C5D: ~basic_string (basic_string.h:803)
==265289== by 0x5C94C5D: profile_fence_event<Kokkos::Serial, Kokkos::Serial::impl_static_fence(const std::string&)::<lambda()> > (Kokkos_Profiling.hpp:219)
==265289== by 0x5C94C5D: impl_static_fence (Kokkos_Serial.hpp:147)
==265289== by 0x5C94C5D: Kokkos::Impl::ExecSpaceDerived<Kokkos::Serial>::static_fence(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (Kokkos_ExecSpaceManager.hpp:131)
==265289== by 0x5C85684: Kokkos::Impl::ExecSpaceManager::static_fence(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (Kokkos_Core.cpp:243)
==265289== by 0x41E8BB: void Kokkos::deep_copy<double [16][10], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, double [16][10], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::Experimental::EmptyViewHooks>(Kokkos::View<double [16][10], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> > const&, Kokkos::View<double [16][10], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::Experimental::EmptyViewHooks> const&, std::enable_if<((is_void_v<Kokkos::ViewTraits<double [16][10], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> >::specialize>)&&(is_void_v<Kokkos::ViewTraits<double [16][10], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::Experimental::EmptyViewHooks>::specialize>))&&((((unsigned int)Kokkos::ViewTraits<double [16][10], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> >::rank)!=(0))||(((unsigned int)Kokkos::ViewTraits<double [16][10], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::Experimental::EmptyViewHooks>::rank)!=(0))), void>::type*) (Kokkos_CopyViews.hpp:1709)
==265289== by 0x41AC57: testSolveQR() (testQR.cpp:87)
==265289== by 0x41B2EC: main (testQR.cpp:140)
==265289== Block was alloc'd at
==265289== at 0x4844F95: operator new(unsigned long) (vg_replace_malloc.c:483)
==265289== by 0x5C943FB: void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char*>(char*, char*, std::forward_iterator_tag) [clone .isra.0] (basic_string.tcc:225)
==265289== by 0x5C94C36: basic_string (basic_string.h:552)
==265289== by 0x5C94C36: profile_fence_event<Kokkos::Serial, Kokkos::Serial::impl_static_fence(const std::string&)::<lambda()> > (Kokkos_Profiling.hpp:219)
==265289== by 0x5C94C36: impl_static_fence (Kokkos_Serial.hpp:147)
==265289== by 0x5C94C36: Kokkos::Impl::ExecSpaceDerived<Kokkos::Serial>::static_fence(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (Kokkos_ExecSpaceManager.hpp:131)
==265289== by 0x5C85684: Kokkos::Impl::ExecSpaceManager::static_fence(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (Kokkos_Core.cpp:243)
==265289== by 0x41E8BB: void Kokkos::deep_copy<double [16][10], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, double [16][10], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::Experimental::EmptyViewHooks>(Kokkos::View<double [16][10], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> > const&, Kokkos::View<double [16][10], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::Experimental::EmptyViewHooks> const&, std::enable_if<((is_void_v<Kokkos::ViewTraits<double [16][10], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> >::specialize>)&&(is_void_v<Kokkos::ViewTraits<double [16][10], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::Experimental::EmptyViewHooks>::specialize>))&&((((unsigned int)Kokkos::ViewTraits<double [16][10], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> >::rank)!=(0))||(((unsigned int)Kokkos::ViewTraits<double [16][10], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::Experimental::EmptyViewHooks>::rank)!=(0))), void>::type*) (Kokkos_CopyViews.hpp:1709)
==265289== by 0x41AC57: testSolveQR() (testQR.cpp:87)
==265289== by 0x41B2EC: main (testQR.cpp:140)
==265289==
==265289==
==265289== HEAP SUMMARY:
==265289== in use at exit: 176 bytes in 1 blocks
==265289== total heap usage: 230 allocs, 229 frees, 115,919 bytes allocated
==265289==
==265289== LEAK SUMMARY:
==265289== definitely lost: 0 bytes in 0 blocks
==265289== indirectly lost: 0 bytes in 0 blocks
==265289== possibly lost: 0 bytes in 0 blocks
==265289== still reachable: 176 bytes in 1 blocks
==265289== suppressed: 0 bytes in 0 blocks
==265289== Rerun with --leak-check=full to see details of leaked memory
==265289==
==265289== For lists of detected and suppressed errors, rerun with: -s
==265289== ERROR SUMMARY: 80 errors from 2 contexts (suppressed: 0 from 0)
Okay, I will try to wrap up the PR and get that tested and merged, then I can move on to trsv, hopefully it's not more complicated than the QR fix but writing proper tests is what takes time!
So I have not looked at it in detail but my guess is that we are assuming the triangular matrix to be stored in a square matrix, size mxm
. Since yours is coming for the QR factorization of a rectangular matrix we need to fix the code so that it works for a mxn
input matrix, basically we will ignore the non-square part of the input. Should not be too bad hopefully. I will create a PR once I have confirmed that this is the issue and we have a fix for you...
Hello,
Calling
SerialQR
on a single matrix defined asKokkos::View<double[16][10]>
and running with the Kokkos Serial backend results in valgrind invalid read and write errors (pasted below). The reproducer is pasted below.Interestingly, when using the CUDA backend in an expanded version of the reproducer (which includes a result comparison after applying the QR factorization via
ApplyQ
andTrsv
) there are no obvious issues.Note, I'm still figuring out how the QR interface works, hence the single matrix input to QR.
Am I doing anything obviously wrong here? Any help is appreciated.
reproducer
kokkos and kokkos-kernels build
I'm building kokkos (develop @ c2a342b26) and kokkos-kernels (develop @ f26fbca1b) with the following cmake commands using GCC 12.3.0 on a RHEL9 system.
valgrind errors