kokkos / kokkos-resilience

Resilience Extensions for Kokkos
Other
5 stars 2 forks source link

Resilient Kokkos RangePolicy parallel_for has a bug in heatdist test #17

Closed ElisabethGiem closed 2 years ago

ElisabethGiem commented 2 years ago

The issue is that RangePolicy resilient parallel_for with single-dimensional resilient views and no MPI fails in the heat distribution test on kahuna. It appears to not run at all/enter infinite loop (program times out), although the precise moment of failure is yet to be determined.

Modules loaded: cmake 3.19.1 gcc7-support gcc 7.5.0

Notes: 0) Branch of Resilient Kokkos: resilient-execution-space 1) No MPI version of heat distribution test works with non-resilient Kokkos 2) heatdist test code: https://github.com/nmm0/veloc-heat-test/commit/ef7a94bb2bf065817c78ed867e1eecd1825ce0d5

keitat commented 2 years ago

Build Kokkos https://github.com/nmm0/kokkos/tree/accessor-hooks with GCC 7.5. (ALL: Please put build instructions when reporting error!)

knteran@klogin3 BUILD_KOKKOS]$ git clone git@github.com:nmm0/kokkos.git
knteran@klogin3 BUILD_KOKKOS]$ cd kokkos
knteran@klogin3 BUILD_KOKKOS]$ git checkout accessor-hooks
knteran@klogin3 BUILD_KOKKOS]$ cmake ../kokkos/  -DKokkos_ENABLE_OPENMP=ON   -DKokkos_ARCH_HSW=ON -DCMAKE_INSTALL_PREFIX=/home/knteran/Kokkos_Haswell_75
-- Setting default Kokkos CXX standard to 11
-- Setting policy CMP0074 to use <Package>_ROOT variables
-- The project name is: Kokkos
-- Using -std=gnu++11 for C++11 extensions as feature
-- Execution Spaces:
--     Device Parallel: NONE
--     Host Parallel: OPENMP
--       Host Serial: NONE
-- 
-- Architectures:
-- Configuring done
-- Generating done
-- Build files have been written to: /home/knteran/ASC/BUILD_KOKKOS
[knteran@klogin3 BUILD_KOKKOS]$ make -j 8 
:
:
[ 90%] Built target kokkoscore
[100%] Built target kokkoscontainers
[knteran@klogin3 BUILD_KOKKOS]$ make install

Build heat-dist code (non resilient) add set(CMAKE_PREFIX_PATH /home/knteran/Kokkos_Haswell_75) in veloc_heat_dist/CMakeLists.txt Comment out heat_dist_resil stuff in CMakeLists.txt I successfully built non-resilient version and it runs!

keitat commented 2 years ago

However, I am failing to build kokkos-resilience (resilient-execution-space branch) with Nic's Kokkos. Large fraction of Jeff's source has been outdated.

keitat commented 2 years ago

I modified CMakeLists.txt under src/resilience, tests and examples directory to disable all Jeff's manual checkpointing code, tests and examples.

@ElisabethGiem, Can I update CMakeLists.txt files to disable all Jeff's stuff?

keitat commented 2 years ago
[knteran@klogin3 ]$ cd kokkos-resilience
[knteran@klogin3 ]$ mkdir BUILD
[knteran@klogin3 ]$ cd BUILD
[knteran@klogin3 BUILD]$  cmake -DCMAKE_BUILD_TYPE=Release -DKokkos_ROOT=/home/knteran/Kokkos_Haswell_75/  -DCMAKE_INSTALL_PREFIX=/home/knteran/resilience_Haswell_75 -DKR_ENABLE_MPI_BACKENDS=OFF -DKR_ENABLE_STDIO=OFF    ..

Building Liz's resilient execution space.

keitat commented 2 years ago

Changing the CMakeList.txt of veloc_heat_dist

cmake_minimum_required(VERSION 3.19)
project(heatdis)

set(CMAKE_MODULE_PATH "${CMAKE_CURRENT_SOURCE_DIR}/cmake/Modules")

set(CMAKE_PREFIX_PATH /home/knteran/Kokkos_Haswell_75;resilience_Haswell_75 )
add_executable(heatdis)
add_executable(heatdis_resil)

add_subdirectory(src)

find_package(Kokkos REQUIRED)
find_package(resilience REQUIRED)
target_link_libraries(heatdis PRIVATE Kokkos::kokkos)
target_link_libraries(heatdis_resil PRIVATE Kokkos::resilience Kokkos::kokkos)

#target_compile_definitions(heatdis_resil PRIVATE USE_RESILIENT_EXEC)

add_subdirectory(tpl)

# Install rules
include(GNUInstallDirs)

install(TARGETS heatdis)
install(TARGETS heatdis_resil heatdis)
keitat commented 2 years ago

The gtest program in kokkos-resilience repo passes! However, there are no example programs. I strongly recommend adding this (I can do that ). Currently working to add resilience execution space to heat-dist. @ElisabethGiem Do you have your version of heat-dist with resilient execution space?

keitat commented 2 years ago

Now, I confirm that the program hangs up.

[knteran@klogin2 BUILD]$ ./heatdis
Kokkos::OpenMP::initialize WARNING: OMP_PROC_BIND environment variable not set
  In general, for best performance with OpenMP 4.0 or better set OMP_PROC_BIND=spread and OMP_PLACES=threads
  For best performance with OpenMP 3.1 set OMP_PROC_BIND=true
  For unit testing set OMP_PROC_BIND=false
Local data size is 2560 x 2563 = 100.000000 MB (100).
Target precision : 0.000010 
Maximum number of iterations : 600 
i: 0 --- v: 0
Step : 0, error = 1.000000
Step : 50, error = 0.484743
Step : 100, error = 0.242139
Step : 150, error = 0.161172
Step : 200, error = 0.121036
Step : 250, error = 0.096793
Step : 300, error = 0.080644
Step : 350, error = 0.069129
Step : 400, error = 0.060499
Step : 450, error = 0.053781
Step : 500, error = 0.048396
Step : 550, error = 0.043974
Execution finished in 12.992989 seconds.
[knteran@klogin2 BUILD]$ ./heatdis_resil 
Kokkos::OpenMP::initialize WARNING: OMP_PROC_BIND environment variable not set
  In general, for best performance with OpenMP 4.0 or better set OMP_PROC_BIND=spread and OMP_PLACES=threads
  For best performance with OpenMP 3.1 set OMP_PROC_BIND=true
  For unit testing set OMP_PROC_BIND=false
Local data size is 2560 x 2563 = 100.000000 MB (100).
Target precision : 0.000010 
Maximum number of iterations : 600 
i: 0 --- v: 0
Step : 0, error = 1.000000

--- wait, it is making a progress, but extremely slow... Something weird is happening.

keitat commented 2 years ago

In heat-dist program, I modified the program to introduce the regular range policy except the parallel_for copying the data, h(i)=g(i);. Interestingly, this part becomes extremely slow. I do not know why this is happening...

keitat commented 2 years ago

@ElisabethGiem Do you expect the resilient parallel_for (for OpenMP) call exec_range() in the original Kokkos instead of the one defined in resilient parallel_for?

keitat commented 2 years ago

I observed resilient parallel for execute non-resilient parallel_for 3 times at the beginning when the program is linked outside kokkos_resilience repository. @nmm0 I need your help. Something is messed up with namespace?

keitat commented 2 years ago

When heat_dist code see the resilient parallel_for, it goes to OpenMPResParallel.hpp. Interestingly, the creation of m_functor_0,1,2 triggers ParallelFor of non-resilient execution space. I have no idea why!! This won't happen if executed inside gtest. (@ElisabethGiem, I see why you are very confident with the test.).

KokkosResilience::ResilientDuplicatesSubscriber::in_resilient_parallel_loop = true;
      auto m_functor_0 = m_functor;
      auto m_functor_1 = m_functor;
      auto m_functor_2 = m_functor;
      KokkosResilience::ResilientDuplicatesSubscriber::in_resilient_parallel_loop = false;

Why does the cop constructor triggers an execution of Parallel_For???

keitat commented 2 years ago

I added an example program under example.

#include <Kokkos_Core.hpp>
#include <resilience/Resilience.hpp>
#include <resilience/openMP/ResHostSpace.hpp>
#include <resilience/openMP/ResOpenMP.hpp>

#define MemSpace KokkosResilience::ResHostSpace
#define ExecSpace KokkosResilience::ResOpenMP

int main( int argc, char **argv )
{
  Kokkos::initialize( argc, argv );
  {
     // range policy with resilient execution space
     using range_policy = Kokkos::RangePolicy<ExecSpace>;
     // test vector types with the duplicating subscriber
     using subscriber_vector_double_type = Kokkos::View< double* , MemSpace,
                                           Kokkos::Experimental::SubscribableViewHooks<
                                          KokkosResilience::ResilientDuplicatesSubscriber > >;
     int  dim0 = 100, dim1 = 5;
     subscriber_vector_double_type view( "test_view", dim0 );
     Kokkos::parallel_for( range_policy (0, dim0), KOKKOS_LAMBDA ( const int i) {
        view ( i ) = i;
     });
     // Data is in host space. It's OK to access with regular loops
     for ( int i = 0; i < dim0; i++) {
       std::cout << "view(" << i << ") = " << view(i) << std::endl;
     }
  }
  Kokkos::finalize();
  return 0;
}

This is a line I added to the CMakeLists.txt in the same directory to create an executable under my build/example directory: add_example(simple_res_openmp SOURCES SimpleResOpenMP.cpp)

Very strange.... I do not see any strange behaviors..... I suspect weird thing happening at install time of kokkos_resilience or build time of heat-dist. Let me write heat-dist inside kokkos_resilience.

keitat commented 2 years ago

I think I found a bug. This is related to duplication of the views. I noticed that parallel_for calls non-resilient parallel_for when copying the functor. It seems the size of views is triggering this problem. For small views, it calls non-resilient parallel_for once to verify the computation at the end. However, the program is messed up with the large views. @ElisabethGiem is it how you duplicate large views? If this is the case, that's OK. However, I found it slows down the program dramatically...

Here is my code (range_policy is set to ResOpenMP )

     for ( dim0 = 10000; dim0 < 12000; dim0++ ) {
        std::cout << "view_size " << dim0 << std::endl;
        subscriber_vector_double_type view( "test_view", dim0 );
        Kokkos::parallel_for( range_policy (0, dim0), KOKKOS_LAMBDA ( const int i) {
            view ( i ) = i+8;
        });
     }

The output looks like (I put a print statement in the beginning of execute() of parallel_for):

view_size 10238
PARALLEL FOR with Normal Range Policy
view_size 10239
PARALLEL FOR with Normal Range Policy
view_size 10240
PARALLEL FOR with Normal Range Policy
PARALLEL FOR with Normal Range Policy
PARALLEL FOR with Normal Range Policy
PARALLEL FOR with Normal Range Policy 
keitat commented 2 years ago

It seems it is how Kokkos executes deep copy for OpenMP. For small data smaller than 10240 and it does sequential copy. However, it executes parallel_for for large views...

keitat commented 2 years ago

I switched all the compilers GCC 10.2.0 and built all the source (including Boost) to make the program running. I will change the title to "performance and potential bugs in data duplication." I suggest applying resilient execution space selectively because triplicating non-computing loops does not make sense.

keitat commented 2 years ago

Here is the result with 16MB of data, 600 iterations with 28 threads. heatdis 2.21 seconds. heatdis_resil 3712 seconds. The output are correct I checked 16MB and 1MB cases). However, it is 1500x slower. We need to investigate the slowdown as I expect 3-5X at worst. (I will submit the issue) .

keitat commented 2 years ago

I switched to the latest version updated. Then, the code fails with heat_dist. See my report: https://github.com/kokkos/kokkos-resilience/pull/14

keitat commented 2 years ago

Closing the issues and I will open another issue to discuss the all task associated with resilient execution space.