kokkos / kokkos

Kokkos C++ Performance Portability Programming Ecosystem: The Programming Model - Parallel Execution and Memory Abstraction
https://kokkos.org
Other
1.99k stars 436 forks source link

CUDA-Double atomic add critical slowdown (reported by Mehmet Deveci) #1442

Closed mndevec closed 6 years ago

mndevec commented 6 years ago

I am running kokkos-kernels spgemm method on P100 GPUs as described here:

https://github.com/kokkos/kokkos-kernels/wiki/p100_configure

For the test described in the page ./KokkosSparse_spgemm.exe --cuda 0 --amtx audikw_1.bin, I was normally getting 0.63 seconds. This kernel uses atomic additions on double precision numbers.

If I compile with kokkos starting from the commmit b8547f38fb0e5e54213b048bb6d5b9d30911ab89, the numeric kernel time increases to ~1.45 seconds.

Any thoughts why this might be the case?

mndevec commented 6 years ago

@ibaned

ibaned commented 6 years ago

I get a link error...

KokkosGraph_color_d2.o: In function `void KokkosSparse::Experimental::spgemm_numeric<KokkosKernels::Experimental::KokkosKernelsHandle<int, int, double, Kokkos::Serial, Kokkos::HostSpace, Kokkos::HostSpace>, Kokkos::View<int const*, Kokkos::LayoutRight, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::MemoryTraits<0u> >, Kokkos::View<int const*, Kokkos::LayoutRight, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::MemoryTraits<0u> >, Kokkos::View<double*, Kokkos::HostSpace>, Kokkos::View<int const*, Kokkos::LayoutRight, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::MemoryTraits<0u> >, Kokkos::View<int const*, Kokkos::LayoutRight, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::MemoryTraits<0u> >, Kokkos::View<double*, Kokkos::HostSpace>, Kokkos::View<int*, Kokkos::HostSpace>, Kokkos::View<int*, Kokkos::HostSpace>, Kokkos::View<double*, Kokkos::HostSpace> >(KokkosKernels::Experimental::KokkosKernelsHandle<int, int, double, Kokkos::Serial, Kokkos::HostSpace, Kokkos::HostSpace>*, KokkosKernels::Experimental::KokkosKernelsHandle<int, int, double, Kokkos::Serial, Kokkos::HostSpace, Kokkos::HostSpace>::const_nnz_lno_t, KokkosKernels::Experimental::KokkosKernelsHandle<int, int, double, Kokkos::Serial, Kokkos::HostSpace, Kokkos::HostSpace>::const_nnz_lno_t, KokkosKernels::Experimental::KokkosKernelsHandle<int, int, double, Kokkos::Serial, Kokkos::HostSpace, Kokkos::HostSpace>::const_nnz_lno_t, Kokkos::View<int const*, Kokkos::LayoutRight, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::MemoryTraits<0u> >, Kokkos::View<int const*, Kokkos::LayoutRight, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::MemoryTraits<0u> >, Kokkos::View<double*, Kokkos::HostSpace>, bool, Kokkos::View<int const*, Kokkos::LayoutRight, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::MemoryTraits<0u> >, Kokkos::View<int const*, Kokkos::LayoutRight, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::MemoryTraits<0u> >, Kokkos::View<double*, Kokkos::HostSpace>, bool, Kokkos::View<int*, Kokkos::HostSpace>, Kokkos::View<int*, Kokkos::HostSpace>&, Kokkos::View<double*, Kokkos::HostSpace>&)':
/ascldap/users/daibane/ride/src/kokkos-kernels/example/buildlib/install/include/KokkosSparse_spgemm_numeric.hpp:249: undefined reference to `KokkosSparse::Impl::SPGEMM_NUMERIC<KokkosKernels::Experimental::KokkosKernelsHandle<int const, int const, double const, Kokkos::Serial, Kokkos::HostSpace, Kokkos::HostSpace>, Kokkos::View<int const*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::MemoryTraits<1u> >, Kokkos::View<int const*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::MemoryTraits<1u> >, Kokkos::View<double const*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::MemoryTraits<1u> >, Kokkos::View<int const*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::MemoryTraits<1u> >, Kokkos::View<int const*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::MemoryTraits<1u> >, Kokkos::View<double const*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::MemoryTraits<1u> >, Kokkos::View<int*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::MemoryTraits<1u> >, Kokkos::View<int*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::MemoryTraits<1u> >, Kokkos::View<double*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::MemoryTraits<1u> >, false, false>::spgemm_numeric(KokkosKernels::Experimental::KokkosKernelsHandle<int const, int const, double const, Kokkos::Serial, Kokkos::HostSpace, Kokkos::HostSpace>*, int, int, int, Kokkos::View<int const*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::MemoryTraits<1u> >, Kokkos::View<int const*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::MemoryTraits<1u> >, Kokkos::View<double const*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::MemoryTraits<1u> >, bool, Kokkos::View<int const*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::MemoryTraits<1u> >, Kokkos::View<int const*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::MemoryTraits<1u> >, Kokkos::View<double const*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::MemoryTraits<1u> >, bool, Kokkos::View<int*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::MemoryTraits<1u> >, Kokkos::View<int*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::MemoryTraits<1u> >&, Kokkos::View<double*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::MemoryTraits<1u> >&)'
ibaned commented 6 years ago

This is my script:

KOKKOS_PATH=${HOME}/ride/src/kokkos #path to kokkos source
KOKKOSKERNELS_SCALARS='double,"complex<double>"' #the scalar types to instantiate =double,float...
KOKKOSKERNELS_LAYOUTS=LayoutLeft #the layout types to instantiate.
KOKKOSKERNELS_ORDINALS=int,long #ordinal types to instantiate
KOKKOSKERNELS_OFFSETS=int,size_t #offset types to instantiate
KOKKOSKERNELS_PATH=../.. #path to kokkos-kernels top directory.
KOKKOSKERNELS_OPTIONS=eti-only #options for kokkoskernels  
CXXFLAGS="-Wall -pedantic -Werror -O3 -g -Wshadow -Wsign-compare -Wignored-qualifiers -Wempty-body -Wclobbered -Wuninitialized"
CXX=${KOKKOS_PATH}/bin/nvcc_wrapper #icpc #
KOKKOS_DEVICES=Cuda #devices Cuda...
KOKKOS_ARCHS=Pascal60,Power8

../../scripts/generate_makefile.bash --kokkoskernels-path=${KOKKOSKERNELS_PATH} --with-scalars=${KOKKOSKERNELS_SCALARS} --with-ordinals=${KOKKOSKERNELS_ORDINALS} --with-offsets=${KOKKOSKERNELS_OFFSETS} --kokkos-path=${KOKKOS_PATH} --with-devices=${KOKKOS_DEVICES} --arch=${KOKKOS_ARCHS} --compiler=${CXX} --with-options=${KOKKOSKERNELS_OPTIONS}  --cxxflags="${CXXFLAGS}"
crtrott commented 6 years ago

This is definitely between shas: e8f42f3efd56fc5b100fc1d8f78d72f2c91dc633 and 91e8fa0c9704e732e8643c6c7a9c392fc8d0f4d7

crtrott commented 6 years ago

It is actually the integer increment which broke due to explicitly calling templated variants which will not hit the native atomics.

crtrott commented 6 years ago

Running spot check on this now.

ibaned commented 6 years ago

Ah okay, I see the problem now. If I specify a template parameter instead of casting, it can only select the templated version and not the overloads. My bad, the fix is to as a habit use casting instead of the template parameter.