kokkos / kokkos

Kokkos C++ Performance Portability Programming Ecosystem: The Programming Model - Parallel Execution and Memory Abstraction
https://kokkos.org
Other
1.98k stars 434 forks source link

Tests should run with `--gtest_shuffle` to trigger hidden bugs (test dependencies or worse) #7435

Open romintomasetti opened 1 week ago

romintomasetti commented 1 week ago

It seems that Kokkos tests aren't being shuffled when they run.

I think they should.

Attached is an example that illustrates the need for shuffling. If the test first runs first, the test second succeeds. If you swap the execution order, i.e. the test case second runs before the test case first, it fails as expected :bomb:

Sample code with 2 tests that fail if `second` runs before `first`. ```c++ class NoShuffleIsBad : public ::testing::Test { public: static constexpr size_t size = 2<<12; using memory_space = Kokkos::CudaSpace; using execution_space = Kokkos::Cuda; using view_t = Kokkos::View; using result_t = typename view_t::non_const_value_type; using sum_t = Kokkos::Sum; void SetUp() override { exec = Kokkos::Experimental::partition_space(execution_space{}, 1).at(0); typename view_t::non_const_type tmp(Kokkos::view_alloc("data", Kokkos::WithoutInitializing, exec)); Kokkos::parallel_for( "IOTA for the data", Kokkos::RangePolicy(exec, 0, size), KOKKOS_LAMBDA(const int index) { tmp(index) = index; } ); this->data = std::move(tmp); exec.fence("Ensure that the setup is finished before running the test."); } protected: execution_space exec; view_t data; }; template struct MySum { ViewType data; MySum(ViewType data_) : data(std::move(data_)) {} template KOKKOS_FUNCTION void operator()(const int index, R& current) const { current += data(index); } }; //! This test will set some unified scratch values - no graph (on AMPERE86 at least). TEST_F(NoShuffleIsBad, first) { result_t reduced_sum = 0; Kokkos::parallel_reduce( Kokkos::RangePolicy(exec, 0, size), MySum(data), sum_t(typename sum_t::result_view_type(&reduced_sum)) ); exec.fence(); ASSERT_EQ(reduced_sum, size * (size - 1) / 2); } //! This test should fail, but the first test saves the game. Enable shuffling and see it fail if it ran first. TEST_F(NoShuffleIsBad, second) { result_t reduced_sum = 0; auto graph = Kokkos::Experimental::create_graph(exec, [&](auto root) { auto node_A = root.then_parallel_reduce( Kokkos::RangePolicy(exec, 0, size), MySum(data), sum_t(typename sum_t::result_view_type(&reduced_sum)) ); }); /// This fence is semantically required (wait for the graph to be submission ready). But it also triggers the issue we see, because it fences what's going on /// at https://github.com/kokkos/kokkos/blob/f1cbeba13ee9adf5a1f4dd47fda1457f9b14f64b/core/src/Cuda/Kokkos_Cuda_Parallel_Range.hpp#L344. exec.fence("Ensure that the graph is ready."); graph.submit(exec); exec.fence(); ASSERT_EQ(reduced_sum, size * (size - 1) / 2); } ```
`Kokkos` configuration output. ```bash [ RUN ] print.config Kokkos Version: 4.4.99 Compiler: KOKKOS_COMPILER_GNU: 1230 KOKKOS_COMPILER_NVCC: 1260 Architecture: CPU architecture: none Default Device: Cuda GPU architecture: AMPERE86 platform: 64bit Atomics: Vectorization: KOKKOS_ENABLE_PRAGMA_IVDEP: no KOKKOS_ENABLE_PRAGMA_LOOPCOUNT: no KOKKOS_ENABLE_PRAGMA_UNROLL: no KOKKOS_ENABLE_PRAGMA_VECTOR: no Memory: Options: KOKKOS_ENABLE_ASM: yes KOKKOS_ENABLE_CXX17: yes KOKKOS_ENABLE_CXX20: no KOKKOS_ENABLE_CXX23: no KOKKOS_ENABLE_CXX26: no KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK: no KOKKOS_ENABLE_HWLOC: yes KOKKOS_ENABLE_LIBDL: yes Host Parallel Execution Space: KOKKOS_ENABLE_OPENMP: yes OpenMP Runtime Configuration: Kokkos::OpenMP thread_pool_topology[ 1 x 16 x 1 ] Host Serial Execution Space: KOKKOS_ENABLE_SERIAL: yes Serial Runtime Configuration: Device Execution Space: KOKKOS_ENABLE_CUDA: yes Cuda Options: KOKKOS_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE: no KOKKOS_ENABLE_CUDA_UVM: no KOKKOS_ENABLE_IMPL_CUDA_MALLOC_ASYNC: no Cuda Runtime Configuration: macro KOKKOS_ENABLE_CUDA : defined macro CUDA_VERSION = 12060 = version 12.6 Kokkos::Cuda[ 0 ] NVIDIA GeForce RTX 3060 Laptop GPU capability 8.6, Total Global Memory: 5.698 GiB, Shared Memory per Block: 48 KiB : Selected ```
romintomasetti commented 1 week ago

Nice to read: https://google.github.io/googletest/advanced.html#shuffling-the-tests.

romintomasetti commented 1 week ago

We decided to create a PR with shuffling environment variables in the Jenkins configuration and some GitHub actions and see what happens.

diff --git a/.jenkins b/.jenkins
index c2c6f500f..d50781bc2 100644
--- a/.jenkins
+++ b/.jenkins
@@ -5,6 +5,7 @@ pipeline {
         CCACHE_DIR = '/tmp/ccache'
         CCACHE_MAXSIZE = '5G'
         CCACHE_CPP2 = 'true'
+        GTEST_SHUFFLE = 1
     }

     options {