kokkos / kokkos-kernels

Kokkos C++ Performance Portability Programming Ecosystem: Math Kernels - Provides BLAS, Sparse BLAS and Graph Kernels
Other
302 stars 95 forks source link

Nightly Sycl unit test failures with intel/2023.1.0, intel/2024.1.0 on Intel Ponte Vecchio #1961

Open ndellingwood opened 1 year ago

ndellingwood commented 1 year ago

Testing with the Sycl backend on Intel Ponte Vecchio on the new Blake showed a couple failing sub-tests (failure output listed below the failing executable), depending on which environment variables set:

Default (ZES_ENABLE_SYSMAN unset)

The following tests FAILED:
   13 - sparse_sycl (Failed)
[  FAILED  ] sycl_test.sparse_coo2crs
[  FAILED  ] sycl_test.sparse_spgemm_jacobi_double_int_size_t_TestExecSpace
[  FAILED  ] sycl_test.sparse_spgemm_double_int_size_t_TestExecSpace
[  FAILED  ] sycl_test.sparse_par_ilut_double_int_size_t_TestExecSpace
[  FAILED  ] sycl_test.sparse_par_ilut_precond_double_int_size_t_TestExecSpace

   14 - blocksparse_sycl (Failed)
[  FAILED  ] sycl_test.sparse_bsr_gauss_seidel_rank1_double_int_size_t_TestExecSpace
[  FAILED  ] sycl_test.sparse_bsr_gauss_seidel_rank2_double_int_size_t_TestExecSpace
[  FAILED  ] sycl_test.sparse_block_spgemm_double_int_size_t_TestExecSpace

   22 - wiki_spgemm (Subprocess aborted)
terminate called after throwing an instance of 'std::runtime_error'
  what():  Error: No memory modules for the SYCL backend found. Make sure that ZES_ENABLE_SYSMAN=1 is set at run time!

ZES_ENABLE_SYSMAN=1

The following tests FAILED:
13/27 Test #13: sparse_sycl ......................Subprocess aborted***Exception:  45.76 sec
[==========] Running 48 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 48 tests from sycl_test
[ RUN      ] sycl_test.sparse_coo2crs
/home/ndellin/kokkos-kernels/test_common/KokkosKernels_TestUtils.hpp:159: Failure
Expected: ((double)AT1::abs(val1 - val2)) <= ((double)AT3::abs(tol)), actual: 4.34205 vs 3.75255e-10
row: 17, crs_col_ids_ref(504) = 20 mismatched values!
Begin arguments for above failure...
RandCooMat<N6Kokkos7complexIdEE, N6Kokkos10LayoutLeftE, N6Kokkos12Experimental4SYCLE130...): rand seed: 3072659895
scalar: N6Kokkos7complexIdEE
layout: N6Kokkos10LayoutLeftE
m: 130, n: 130
...end arguments for above failure.
...
[  FAILED  ] sycl_test.sparse_coo2crs (20842 ms)
[ RUN      ] sycl_test.sparse_spgemm_jacobi_double_int_size_t_TestExecSpace
terminate called after throwing an instance of 'std::runtime_error'
  what():  There was a synchronous SYCL error:
Native API failed. Native API returns: -5 (PI_ERROR_OUT_OF_RESOURCES) -5 (PI_ERROR_OUT_OF_RESOURCES)

14/27 Test #14: blocksparse_sycl .................***Failed   33.50 sec
...
[ RUN      ] sycl_test.sparse_block_spgemm_double_int_size_t_TestExecSpace
nentries_actual:1564 nentries_reference:2423
/home/ndellin/kokkos-kernels/sparse/unit_test/Test_Sparse_bspgemm.hpp:235: Failure
Value of: is_identical
  Actual: false
Expected: true
SPGEMM_KK
...

Reproducer (Blake PV queue): SHAs: kokkos/kokkos@7e299b4e25c42528e105379c3aa9a318056545ba kokkos/kokkos-kernels@acdd8969109b53f2b3b0915ef51aef9800a44587

module load cmake intel-oneapi-compilers/2023.1.0

$KOKKOSKERNELS_PATH/cm_generate_makefile.bash --with-sycl --arch=INTEL_PVC --compiler=/projects/x86-64-icelake-rocky8/compilers/intel-oneapi-compilers/2023.1.0/gcc/8.5.0/base/6g2jkiv/compiler/2023.1.0/linux/bin-llvm/clang++ --cxxflags="-fp-model=precise" --shared --kokkos-cmake-flags=-DKokkos_ENABLE_ONEDPL=OFF

Edit: Added shas used in the testing

ndellingwood commented 3 months ago

Updating the issue with failures as of SHA 32aa75a8f20ca88df64bde421c335b9fa6f68397

Configuration 1 (no TPLs):

salloc -N 1 -p PV

source /projects/x86-64-icelake-rocky8/spack-config/blake-setup-user-module-env.sh
module purge
module load cmake intel-oneapi-compilers/2023.1.0 intel-oneapi-dpl/2022.1.0 git

# Required for the hashmap accumulator
export ZES_ENABLE_SYSMAN=1

# Configuration
$KOKKOSKERNELS_PATH/cm_generate_makefile.bash --with-sycl --arch=INTEL_PVC --compiler=/projects/x86-64-icelake-rocky8/compilers/intel-oneapi-compilers/2023.1.0/gcc/8.5.0/base/6g2jkiv/compiler/2023.1.0/linux/bin-llvm/clang++ --cxxflags="-fp-model=precise" --shared --kokkos-cmake-flags=-DKokkos_ENABLE_ONEDPL=OFF -kokkos-path=$KOKKOS_PATH

Test failures on PVC:

23:43:24 The following tests FAILED:
23:43:24     15 - sparse_sycl (SEGFAULT)
23:43:24     16 - blocksparse_sycl (Failed)

Configuration 2 (oneMKL):

salloc -N 1 -p PV

source /projects/x86-64-icelake-rocky8/spack-config/blake-setup-user-module-env.sh
module purge
module load git cmake intel-oneapi-compilers/2023.1.0 intel-oneapi-dpl/2022.1.0 intel-oneapi-mkl/2023.1.0 intel-oneapi-tbb/2021.9.0

# Required for the hashmap accumulator
export ZES_ENABLE_SYSMAN=1

# Configuration
$KOKKOSKERNELS_PATH/cm_generate_makefile.bash --with-sycl --arch=INTEL_PVC --compiler=icpx --cxxflags="-fp-model=precise" --shared --with-tpls=mkl --kokkos-cmake-flags=-DKokkos_ENABLE_ONEDPL=OFF -kokkos-path=$KOKKOS_PATH

Test failures on PVC:

05:49:17 The following tests FAILED:
05:49:17      9 - blas_sycl (Failed)
05:49:17     15 - sparse_sycl (Subprocess aborted)
05:49:17     16 - blocksparse_sycl (Failed)
05:49:17     26 - wiki_spadd (Subprocess aborted)
ndellingwood commented 3 months ago

Joe installed intel oneapi 2024.1.0 on Blake, I tested the MKL configuration above:

Test failures:

15/32 Test #15: sparse_sycl ......................***Failed  194.78 sec
...
[  PASSED  ] 47 tests.
[  FAILED  ] 4 tests, listed below:
[  FAILED  ] sycl_test.sparse_spgemm_jacobi_double_int_int_TestDevice
[  FAILED  ] sycl_test.sparse_spgemm_double_int_int_TestDevice
[  FAILED  ] sycl_test.sparse_spmv_double_int_int_TestDevice
[  FAILED  ] sycl_test.sparse_par_ilut_double_int_int_TestDevice

16/32 Test #16: blocksparse_sycl .................***Failed   29.87 sec
...
[==========] 7 tests from 1 test case ran. (29406 ms total)
[  PASSED  ] 6 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] sycl_test.sparse_block_spgemm_double_int_int_TestDevice

Configuration (Sycl backend, intel/2024.1.0 with mkl/2024.0.0):

source /projects/x86-64-icelake-rocky8/spack-config/blake-setup-user-module-env.sh
module purge
module load cmake intel-oneapi-compilers/2024.1.0 intel-oneapi-dpl/2022.5.0 intel-oneapi-tbb/2021.12.0 intel-oneapi-mkl/2024.0.0
module list

# Required for the hashmap accumulator
export ZES_ENABLE_SYSMAN=1

# Configuration
$KOKKOSKERNELS_PATH/cm_generate_makefile.bash --with-sycl --arch=INTEL_PVC --compiler=icpx --cxxflags="-fp-model=precise -Wno-pass-failed" --shared --with-tpls=mkl --kokkos-path=$KOKKOS_PATH

make -j16

# Unit tests
export ONEAPI_DEVICE_SELECTOR=ext_oneapi_level_zero:gpu
ctest --output-on-failure
cwpearson commented 2 months ago

I've been poking around with this:

In the SpGEMM, it seems that Kokkos::atomic_add(addr, val); always results in *addr = 0, (or possibly *addr unchanged, e.g. Kokkos::atomic_add is a no-op).

however, *addr += val; causes some math to happen (though produces the incorrect values in a context where atomics are needed

I've tried replacing Kokkos::atomic_add(addr, val) with various flavors of

auto v = sycl::atomic_ref<std::remove_reference_t<decltype(*addr)>,
                          sycl::memory_order::relaxed,
                          sycl::memory_scope::device, 
                          sycl::access::address_space::global_space>(*addr);
v += val;

but no luck so far.

I've also tried running the Kokkos Core atomics unit tests built with the same Core that I use for the Kernels unit tests, and the Core atomic unit tests all pass.

cwpearson commented 2 months ago

Reimplementing alignPtr as

template <typename InPtr, typename T>
KOKKOS_INLINE_FUNCTION T *alignPtr(InPtr p) {
  std::uintptr_t ptrVal = reinterpret_cast<std::uintptr_t>(p);
  while (ptrVal % alignof(T)) {
    ++ptrVal;
  }
  return reinterpret_cast<T *>(ptrVal);
}

seems to make the SpGEMM unit tests pass. However, using the equivalent

template <typename InPtr, typename T>
KOKKOS_INLINE_FUNCTION T *alignPtr(InPtr p) {
  std::uintptr_t ptrVal = reinterpret_cast<std::uintptr_t>(p);
  return reinterpret_cast<T *>((ptrVal + alignof(T) - 1) / alignof(T) * alignof(T));
}

does not. May be a SYCL compiler issue (unless (ptrVal + alignof(T) - 1) overflows)

cwpearson commented 2 months ago
unsigned int f1(unsigned int i, unsigned int align) // today
{
    return ((i + align - 1) & (~(align - 1)));
}

unsigned int f2(unsigned int i, unsigned int align)
{
    return ((i + align - 1) / align * align);
}

unsigned int f3(unsigned int i, unsigned int align) // gcc
{
    return (i + align - 1) & (-align);
}

unsigned int f4(unsigned int i, unsigned int align)
{
    while (i % align) {
        ++i;
    }
    return i;
}

only f4 works for SYCL SpGEMM

in clang-trunk x86 in godbolt, f1 and f3 compile to the same instructions. f2 and f4 are each different again.

ndellingwood commented 2 months ago

Status update as-of 7/9/2024 following merge of some recent fixes:

Sycl + PV, no MKL

Failing tests

23:41:05 The following tests FAILED:
23:41:05     15 - sparse_sycl (Failed)

Failure output snips: sparse_sycl

23:40:11 [ RUN      ] sycl_test.sparse_coo2crs
23:40:11 /home/jenkins/blake-new/workspace/KokkosKernels_Nightly_Blake_OneAPI_2023_1_0_Sycl_PV/kokkos-kernels/test_common/KokkosKernels_TestUtils.hpp:159: Failure
23:40:11 Expected: ((double)AT1::abs(val1 - val2)) <= ((double)AT3::abs(tol)), actual: 1.37386 vs 3.71438e-09
23:40:11 row: 31, crs_col_ids_ref(2871) = 25 mismatched values!
23:40:11 Begin arguments for above failure...
23:40:11 RandCooMat<N6Kokkos7complexIdEE, N6Kokkos10LayoutLeftE, N6Kokkos12Experimental4SYCLE409...): rand seed: 990578628
23:40:11 scalar: N6Kokkos7complexIdEE
23:40:11 layout: N6Kokkos10LayoutLeftE
23:40:11 m: 409, n: 409
23:40:11 ...end arguments for above failure.
23:40:11 
23:40:11 /home/jenkins/blake-new/workspace/KokkosKernels_Nightly_Blake_OneAPI_2023_1_0_Sycl_PV/kokkos-kernels/test_common/KokkosKernels_TestUtils.hpp:159: Failure
23:40:11 Expected: ((double)AT1::abs(val1 - val2)) <= ((double)AT3::abs(tol)), actual: 3.55771 vs 3.71438e-09
23:40:11 row: 37, crs_col_ids_ref(3420) = 213 mismatched values!
23:40:11 Begin arguments for above failure...
23:40:11 RandCooMat<N6Kokkos7complexIdEE, N6Kokkos10LayoutLeftE, N6Kokkos12Experimental4SYCLE409...): rand seed: 990578628
23:40:11 scalar: N6Kokkos7complexIdEE
23:40:11 layout: N6Kokkos10LayoutLeftE
23:40:11 m: 409, n: 409
23:40:11 ...end arguments for above failure.

Sycl + PV, with MKL

Failing tests:

05:51:47 The following tests FAILED:
05:51:47      9 - blas_sycl (Failed)
05:51:47     15 - sparse_sycl (Failed)
05:51:47     16 - blocksparse_sycl (Failed)

Failure output snips: blas_sycl

05:47:45 [ RUN      ] sycl_test.gemv_double
05:47:45 /home/jenkins/blake-new/workspace/KokkosKernels_Nightly_Blake_OneAPI_2023_1_0_Sycl_PV-oneMKL/kokkos-kernels/blas/unit_test/Test_Blas2_gemv.hpp:141: expected(0)=-1.50128, h_y(0)=nan, eps=2.22045e-16, 1024*2*eps=4.54747e-13
...
05:47:45 beta = 0, input contains NaN, A is 2131x2131, mode T: gemv incorrect
05:47:45 [  FAILED  ] sycl_test.gemv_double (643 ms)
05:47:45 [ RUN      ] sycl_test.blas_gemv_streams_double_int_int_TestDevice
05:47:45 /home/jenkins/blake-new/workspace/KokkosKernels_Nightly_Blake_OneAPI_2023_1_0_Sycl_PV-oneMKL/kokkos-kernels/blas/unit_test/Test_Blas2_gemv.hpp:141: expected(0)=19.9203, h_y(0)=nan, eps=2.22045e-16, 1024*2*eps=4.54747e-13
...
05:47:45 Value of: 0
05:47:45 Expected: numErrors
05:47:45 Which is: 40
05:47:45 beta = 0, input contains NaN, A is 50x40, mode T: gemv incorrect
05:47:45 [  FAILED  ] sycl_test.blas_gemv_streams_double_int_int_TestDevice (79 ms)

sparse_sycl:

05:50:55 [ RUN      ] sycl_test.sparse_coo2crs
05:50:55 /home/jenkins/blake-new/workspace/KokkosKernels_Nightly_Blake_OneAPI_2023_1_0_Sycl_PV-oneMKL/kokkos-kernels/test_common/KokkosKernels_TestUtils.hpp:159: Failure
05:50:55 Expected: ((double)AT1::abs(val1 - val2)) <= ((double)AT3::abs(tol)), actual: 5.46403 vs 2.86173e-09
05:50:55 row: 19, crs_col_ids_ref(1595) = 288 mismatched values!
05:50:55 Begin arguments for above failure...
05:50:55 RandCooMat<N6Kokkos7complexIdEE, N6Kokkos10LayoutLeftE, N6Kokkos12Experimental4SYCLE359...): rand seed: 3195414009
05:50:55 scalar: N6Kokkos7complexIdEE
05:50:55 layout: N6Kokkos10LayoutLeftE
05:50:55 m: 359, n: 359
05:50:55 ...end arguments for above failure.
...
05:50:55 [ RUN      ] sycl_test.sparse_spmv_double_int_int_TestDevice
05:50:55 /home/jenkins/blake-new/workspace/KokkosKernels_Nightly_Blake_OneAPI_2023_1_0_Sycl_PV-oneMKL/kokkos-kernels/sparse/unit_test/Test_Sparse_spmv.hpp:216: Failure
05:50:55 Value of: threw
05:50:55   Actual: true
05:50:55 Expected: false
05:50:55 KokkosSparse::Test::spmv 1D, mode T: threw exception:
05:50:55 oneapi::mkl::sparse::gemv: unimplemented functionality: currently only supports the oneapi::mkl::transpose::nontrans op
...
05:50:55 [ RUN      ] sycl_test.sparse_spmv_mv_double_int_int_LayoutLeft_TestDevice
05:50:55 /home/jenkins/blake-new/workspace/KokkosKernels_Nightly_Blake_OneAPI_2023_1_0_Sycl_PV-oneMKL/kokkos-kernels/sparse/unit_test/Test_Sparse_spmv.hpp:268: Failure
05:50:55 Value of: threw
05:50:55   Actual: true
05:50:55 Expected: false
05:50:55 KokkosSparse::Test::spmv 2D, mode T: threw exception:
05:50:55 oneapi::mkl::sparse::gemv: unimplemented functionality: currently only supports the oneapi::mkl::transpose::nontrans op
...
05:50:55 [ RUN      ] sycl_test.sparse_sptrsv_double_int_int_TestDevice
05:50:55 unknown file: Failure
05:50:55 C++ exception with description "oneapi::mkl::sparse::optimize_gemv: unimplemented functionality: currently only supports the oneapi::mkl::transpose::nontrans op" thrown in the test body.
...
05:50:55 [  PASSED  ] 47 tests.
05:50:55 [  FAILED  ] 4 tests, listed below:
05:50:55 [  FAILED  ] sycl_test.sparse_coo2crs
05:50:55 [  FAILED  ] sycl_test.sparse_spmv_double_int_int_TestDevice
05:50:55 [  FAILED  ] sycl_test.sparse_spmv_mv_double_int_int_LayoutLeft_TestDevice
05:50:55 [  FAILED  ] sycl_test.sparse_sptrsv_double_int_int_TestDevice
05:50:55 
05:50:55  4 FAILED TESTS

blocksparse_sycl

05:51:18 [ RUN      ] sycl_test.sparse_bsr_spmv_double_int_int_TestDevice
05:51:18 unknown file: Failure
05:51:18 C++ exception with description "oneapi::mkl::sparse::optimize_gemv: unimplemented functionality: currently only supports the oneapi::mkl::transpose::nontrans op" thrown in the test body.
05:51:18 [  FAILED  ] sycl_test.sparse_bsr_spmv_double_int_int_TestDevice (1627 ms)
05:51:18 [ RUN      ] sycl_test.sparse_bsr_spmmv_double_int_int_LayoutLeft_TestDevice
05:51:18 unknown file: Failure
05:51:18 C++ exception with description "oneapi::mkl::sparse::optimize_gemv: unimplemented functionality: currently only supports the oneapi::mkl::transpose::nontrans op" thrown in the test body.
05:51:18 [  FAILED  ] sycl_test.sparse_bsr_spmmv_double_int_int_LayoutLeft_TestDevice (3108 ms)
05:51:18 [----------] 7 tests from sycl_test (22517 ms total)
05:51:18 
05:51:18 [----------] Global test environment tear-down
05:51:18 [==========] 7 tests from 1 test case ran. (22517 ms total)
05:51:18 [  PASSED  ] 5 tests.
05:51:18 [  FAILED  ] 2 tests, listed below:
05:51:18 [  FAILED  ] sycl_test.sparse_bsr_spmv_double_int_int_TestDevice
05:51:18 [  FAILED  ] sycl_test.sparse_bsr_spmmv_double_int_int_LayoutLeft_TestDevice
ndellingwood commented 1 month ago

Status update 7/12/2024:

After the recent gemv fallback updates, the Sycl builds are in better shape with only the sparse_coo2crs test failure remaining:

sparse_sycl

23:40:11 [ RUN      ] sycl_test.sparse_coo2crs
23:40:11 /home/jenkins/blake-new/workspace/KokkosKernels_Nightly_Blake_OneAPI_2023_1_0_Sycl_PV/kokkos-kernels/test_common/KokkosKernels_TestUtils.hpp:159: Failure
23:40:11 Expected: ((double)AT1::abs(val1 - val2)) <= ((double)AT3::abs(tol)), actual: 1.37386 vs 3.71438e-09
23:40:11 row: 31, crs_col_ids_ref(2871) = 25 mismatched values!
23:40:11 Begin arguments for above failure...
23:40:11 RandCooMat<N6Kokkos7complexIdEE, N6Kokkos10LayoutLeftE, N6Kokkos12Experimental4SYCLE409...): rand seed: 990578628
23:40:11 scalar: N6Kokkos7complexIdEE
23:40:11 layout: N6Kokkos10LayoutLeftE
23:40:11 m: 409, n: 409
23:40:11 ...end arguments for above failure.
23:40:11 
23:40:11 /home/jenkins/blake-new/workspace/KokkosKernels_Nightly_Blake_OneAPI_2023_1_0_Sycl_PV/kokkos-kernels/test_common/KokkosKernels_TestUtils.hpp:159: Failure
23:40:11 Expected: ((double)AT1::abs(val1 - val2)) <= ((double)AT3::abs(tol)), actual: 3.55771 vs 3.71438e-09
23:40:11 row: 37, crs_col_ids_ref(3420) = 213 mismatched values!
23:40:11 Begin arguments for above failure...
23:40:11 RandCooMat<N6Kokkos7complexIdEE, N6Kokkos10LayoutLeftE, N6Kokkos12Experimental4SYCLE409...): rand seed: 990578628
23:40:11 scalar: N6Kokkos7complexIdEE
23:40:11 layout: N6Kokkos10LayoutLeftE
23:40:11 m: 409, n: 409
23:40:11 ...end arguments for above failure.