Open ndellingwood opened 1 year ago
Updating the issue with failures as of SHA 32aa75a8f20ca88df64bde421c335b9fa6f68397
Configuration 1 (no TPLs):
salloc -N 1 -p PV
source /projects/x86-64-icelake-rocky8/spack-config/blake-setup-user-module-env.sh
module purge
module load cmake intel-oneapi-compilers/2023.1.0 intel-oneapi-dpl/2022.1.0 git
# Required for the hashmap accumulator
export ZES_ENABLE_SYSMAN=1
# Configuration
$KOKKOSKERNELS_PATH/cm_generate_makefile.bash --with-sycl --arch=INTEL_PVC --compiler=/projects/x86-64-icelake-rocky8/compilers/intel-oneapi-compilers/2023.1.0/gcc/8.5.0/base/6g2jkiv/compiler/2023.1.0/linux/bin-llvm/clang++ --cxxflags="-fp-model=precise" --shared --kokkos-cmake-flags=-DKokkos_ENABLE_ONEDPL=OFF -kokkos-path=$KOKKOS_PATH
Test failures on PVC:
23:43:24 The following tests FAILED:
23:43:24 15 - sparse_sycl (SEGFAULT)
23:43:24 16 - blocksparse_sycl (Failed)
Configuration 2 (oneMKL):
salloc -N 1 -p PV
source /projects/x86-64-icelake-rocky8/spack-config/blake-setup-user-module-env.sh
module purge
module load git cmake intel-oneapi-compilers/2023.1.0 intel-oneapi-dpl/2022.1.0 intel-oneapi-mkl/2023.1.0 intel-oneapi-tbb/2021.9.0
# Required for the hashmap accumulator
export ZES_ENABLE_SYSMAN=1
# Configuration
$KOKKOSKERNELS_PATH/cm_generate_makefile.bash --with-sycl --arch=INTEL_PVC --compiler=icpx --cxxflags="-fp-model=precise" --shared --with-tpls=mkl --kokkos-cmake-flags=-DKokkos_ENABLE_ONEDPL=OFF -kokkos-path=$KOKKOS_PATH
Test failures on PVC:
05:49:17 The following tests FAILED:
05:49:17 9 - blas_sycl (Failed)
05:49:17 15 - sparse_sycl (Subprocess aborted)
05:49:17 16 - blocksparse_sycl (Failed)
05:49:17 26 - wiki_spadd (Subprocess aborted)
Joe installed intel oneapi 2024.1.0 on Blake, I tested the MKL configuration above:
Test failures:
15/32 Test #15: sparse_sycl ......................***Failed 194.78 sec
...
[ PASSED ] 47 tests.
[ FAILED ] 4 tests, listed below:
[ FAILED ] sycl_test.sparse_spgemm_jacobi_double_int_int_TestDevice
[ FAILED ] sycl_test.sparse_spgemm_double_int_int_TestDevice
[ FAILED ] sycl_test.sparse_spmv_double_int_int_TestDevice
[ FAILED ] sycl_test.sparse_par_ilut_double_int_int_TestDevice
16/32 Test #16: blocksparse_sycl .................***Failed 29.87 sec
...
[==========] 7 tests from 1 test case ran. (29406 ms total)
[ PASSED ] 6 tests.
[ FAILED ] 1 test, listed below:
[ FAILED ] sycl_test.sparse_block_spgemm_double_int_int_TestDevice
Configuration (Sycl backend, intel/2024.1.0 with mkl/2024.0.0):
source /projects/x86-64-icelake-rocky8/spack-config/blake-setup-user-module-env.sh
module purge
module load cmake intel-oneapi-compilers/2024.1.0 intel-oneapi-dpl/2022.5.0 intel-oneapi-tbb/2021.12.0 intel-oneapi-mkl/2024.0.0
module list
# Required for the hashmap accumulator
export ZES_ENABLE_SYSMAN=1
# Configuration
$KOKKOSKERNELS_PATH/cm_generate_makefile.bash --with-sycl --arch=INTEL_PVC --compiler=icpx --cxxflags="-fp-model=precise -Wno-pass-failed" --shared --with-tpls=mkl --kokkos-path=$KOKKOS_PATH
make -j16
# Unit tests
export ONEAPI_DEVICE_SELECTOR=ext_oneapi_level_zero:gpu
ctest --output-on-failure
I've been poking around with this:
In the SpGEMM, it seems that Kokkos::atomic_add(addr, val);
always results in *addr = 0
, (or possibly *addr
unchanged, e.g. Kokkos::atomic_add
is a no-op).
however, *addr += val;
causes some math to happen (though produces the incorrect values in a context where atomics are needed
I've tried replacing Kokkos::atomic_add(addr, val)
with various flavors of
auto v = sycl::atomic_ref<std::remove_reference_t<decltype(*addr)>,
sycl::memory_order::relaxed,
sycl::memory_scope::device,
sycl::access::address_space::global_space>(*addr);
v += val;
but no luck so far.
I've also tried running the Kokkos Core atomics unit tests built with the same Core that I use for the Kernels unit tests, and the Core atomic unit tests all pass.
Reimplementing alignPtr
as
template <typename InPtr, typename T>
KOKKOS_INLINE_FUNCTION T *alignPtr(InPtr p) {
std::uintptr_t ptrVal = reinterpret_cast<std::uintptr_t>(p);
while (ptrVal % alignof(T)) {
++ptrVal;
}
return reinterpret_cast<T *>(ptrVal);
}
seems to make the SpGEMM unit tests pass. However, using the equivalent
template <typename InPtr, typename T>
KOKKOS_INLINE_FUNCTION T *alignPtr(InPtr p) {
std::uintptr_t ptrVal = reinterpret_cast<std::uintptr_t>(p);
return reinterpret_cast<T *>((ptrVal + alignof(T) - 1) / alignof(T) * alignof(T));
}
does not. May be a SYCL compiler issue (unless (ptrVal + alignof(T) - 1)
overflows)
unsigned int f1(unsigned int i, unsigned int align) // today
{
return ((i + align - 1) & (~(align - 1)));
}
unsigned int f2(unsigned int i, unsigned int align)
{
return ((i + align - 1) / align * align);
}
unsigned int f3(unsigned int i, unsigned int align) // gcc
{
return (i + align - 1) & (-align);
}
unsigned int f4(unsigned int i, unsigned int align)
{
while (i % align) {
++i;
}
return i;
}
only f4
works for SYCL SpGEMM
in clang-trunk x86 in godbolt, f1 and f3 compile to the same instructions. f2 and f4 are each different again.
Status update as-of 7/9/2024 following merge of some recent fixes:
Failing tests
23:41:05 The following tests FAILED:
23:41:05 15 - sparse_sycl (Failed)
Failure output snips: sparse_sycl
23:40:11 [ RUN ] sycl_test.sparse_coo2crs
23:40:11 /home/jenkins/blake-new/workspace/KokkosKernels_Nightly_Blake_OneAPI_2023_1_0_Sycl_PV/kokkos-kernels/test_common/KokkosKernels_TestUtils.hpp:159: Failure
23:40:11 Expected: ((double)AT1::abs(val1 - val2)) <= ((double)AT3::abs(tol)), actual: 1.37386 vs 3.71438e-09
23:40:11 row: 31, crs_col_ids_ref(2871) = 25 mismatched values!
23:40:11 Begin arguments for above failure...
23:40:11 RandCooMat<N6Kokkos7complexIdEE, N6Kokkos10LayoutLeftE, N6Kokkos12Experimental4SYCLE409...): rand seed: 990578628
23:40:11 scalar: N6Kokkos7complexIdEE
23:40:11 layout: N6Kokkos10LayoutLeftE
23:40:11 m: 409, n: 409
23:40:11 ...end arguments for above failure.
23:40:11
23:40:11 /home/jenkins/blake-new/workspace/KokkosKernels_Nightly_Blake_OneAPI_2023_1_0_Sycl_PV/kokkos-kernels/test_common/KokkosKernels_TestUtils.hpp:159: Failure
23:40:11 Expected: ((double)AT1::abs(val1 - val2)) <= ((double)AT3::abs(tol)), actual: 3.55771 vs 3.71438e-09
23:40:11 row: 37, crs_col_ids_ref(3420) = 213 mismatched values!
23:40:11 Begin arguments for above failure...
23:40:11 RandCooMat<N6Kokkos7complexIdEE, N6Kokkos10LayoutLeftE, N6Kokkos12Experimental4SYCLE409...): rand seed: 990578628
23:40:11 scalar: N6Kokkos7complexIdEE
23:40:11 layout: N6Kokkos10LayoutLeftE
23:40:11 m: 409, n: 409
23:40:11 ...end arguments for above failure.
Failing tests:
05:51:47 The following tests FAILED:
05:51:47 9 - blas_sycl (Failed)
05:51:47 15 - sparse_sycl (Failed)
05:51:47 16 - blocksparse_sycl (Failed)
Failure output snips: blas_sycl
05:47:45 [ RUN ] sycl_test.gemv_double
05:47:45 /home/jenkins/blake-new/workspace/KokkosKernels_Nightly_Blake_OneAPI_2023_1_0_Sycl_PV-oneMKL/kokkos-kernels/blas/unit_test/Test_Blas2_gemv.hpp:141: expected(0)=-1.50128, h_y(0)=nan, eps=2.22045e-16, 1024*2*eps=4.54747e-13
...
05:47:45 beta = 0, input contains NaN, A is 2131x2131, mode T: gemv incorrect
05:47:45 [ FAILED ] sycl_test.gemv_double (643 ms)
05:47:45 [ RUN ] sycl_test.blas_gemv_streams_double_int_int_TestDevice
05:47:45 /home/jenkins/blake-new/workspace/KokkosKernels_Nightly_Blake_OneAPI_2023_1_0_Sycl_PV-oneMKL/kokkos-kernels/blas/unit_test/Test_Blas2_gemv.hpp:141: expected(0)=19.9203, h_y(0)=nan, eps=2.22045e-16, 1024*2*eps=4.54747e-13
...
05:47:45 Value of: 0
05:47:45 Expected: numErrors
05:47:45 Which is: 40
05:47:45 beta = 0, input contains NaN, A is 50x40, mode T: gemv incorrect
05:47:45 [ FAILED ] sycl_test.blas_gemv_streams_double_int_int_TestDevice (79 ms)
sparse_sycl:
05:50:55 [ RUN ] sycl_test.sparse_coo2crs
05:50:55 /home/jenkins/blake-new/workspace/KokkosKernels_Nightly_Blake_OneAPI_2023_1_0_Sycl_PV-oneMKL/kokkos-kernels/test_common/KokkosKernels_TestUtils.hpp:159: Failure
05:50:55 Expected: ((double)AT1::abs(val1 - val2)) <= ((double)AT3::abs(tol)), actual: 5.46403 vs 2.86173e-09
05:50:55 row: 19, crs_col_ids_ref(1595) = 288 mismatched values!
05:50:55 Begin arguments for above failure...
05:50:55 RandCooMat<N6Kokkos7complexIdEE, N6Kokkos10LayoutLeftE, N6Kokkos12Experimental4SYCLE359...): rand seed: 3195414009
05:50:55 scalar: N6Kokkos7complexIdEE
05:50:55 layout: N6Kokkos10LayoutLeftE
05:50:55 m: 359, n: 359
05:50:55 ...end arguments for above failure.
...
05:50:55 [ RUN ] sycl_test.sparse_spmv_double_int_int_TestDevice
05:50:55 /home/jenkins/blake-new/workspace/KokkosKernels_Nightly_Blake_OneAPI_2023_1_0_Sycl_PV-oneMKL/kokkos-kernels/sparse/unit_test/Test_Sparse_spmv.hpp:216: Failure
05:50:55 Value of: threw
05:50:55 Actual: true
05:50:55 Expected: false
05:50:55 KokkosSparse::Test::spmv 1D, mode T: threw exception:
05:50:55 oneapi::mkl::sparse::gemv: unimplemented functionality: currently only supports the oneapi::mkl::transpose::nontrans op
...
05:50:55 [ RUN ] sycl_test.sparse_spmv_mv_double_int_int_LayoutLeft_TestDevice
05:50:55 /home/jenkins/blake-new/workspace/KokkosKernels_Nightly_Blake_OneAPI_2023_1_0_Sycl_PV-oneMKL/kokkos-kernels/sparse/unit_test/Test_Sparse_spmv.hpp:268: Failure
05:50:55 Value of: threw
05:50:55 Actual: true
05:50:55 Expected: false
05:50:55 KokkosSparse::Test::spmv 2D, mode T: threw exception:
05:50:55 oneapi::mkl::sparse::gemv: unimplemented functionality: currently only supports the oneapi::mkl::transpose::nontrans op
...
05:50:55 [ RUN ] sycl_test.sparse_sptrsv_double_int_int_TestDevice
05:50:55 unknown file: Failure
05:50:55 C++ exception with description "oneapi::mkl::sparse::optimize_gemv: unimplemented functionality: currently only supports the oneapi::mkl::transpose::nontrans op" thrown in the test body.
...
05:50:55 [ PASSED ] 47 tests.
05:50:55 [ FAILED ] 4 tests, listed below:
05:50:55 [ FAILED ] sycl_test.sparse_coo2crs
05:50:55 [ FAILED ] sycl_test.sparse_spmv_double_int_int_TestDevice
05:50:55 [ FAILED ] sycl_test.sparse_spmv_mv_double_int_int_LayoutLeft_TestDevice
05:50:55 [ FAILED ] sycl_test.sparse_sptrsv_double_int_int_TestDevice
05:50:55
05:50:55 4 FAILED TESTS
blocksparse_sycl
05:51:18 [ RUN ] sycl_test.sparse_bsr_spmv_double_int_int_TestDevice
05:51:18 unknown file: Failure
05:51:18 C++ exception with description "oneapi::mkl::sparse::optimize_gemv: unimplemented functionality: currently only supports the oneapi::mkl::transpose::nontrans op" thrown in the test body.
05:51:18 [ FAILED ] sycl_test.sparse_bsr_spmv_double_int_int_TestDevice (1627 ms)
05:51:18 [ RUN ] sycl_test.sparse_bsr_spmmv_double_int_int_LayoutLeft_TestDevice
05:51:18 unknown file: Failure
05:51:18 C++ exception with description "oneapi::mkl::sparse::optimize_gemv: unimplemented functionality: currently only supports the oneapi::mkl::transpose::nontrans op" thrown in the test body.
05:51:18 [ FAILED ] sycl_test.sparse_bsr_spmmv_double_int_int_LayoutLeft_TestDevice (3108 ms)
05:51:18 [----------] 7 tests from sycl_test (22517 ms total)
05:51:18
05:51:18 [----------] Global test environment tear-down
05:51:18 [==========] 7 tests from 1 test case ran. (22517 ms total)
05:51:18 [ PASSED ] 5 tests.
05:51:18 [ FAILED ] 2 tests, listed below:
05:51:18 [ FAILED ] sycl_test.sparse_bsr_spmv_double_int_int_TestDevice
05:51:18 [ FAILED ] sycl_test.sparse_bsr_spmmv_double_int_int_LayoutLeft_TestDevice
Status update 7/12/2024:
After the recent gemv fallback updates, the Sycl builds are in better shape with only the sparse_coo2crs
test failure remaining:
sparse_sycl
23:40:11 [ RUN ] sycl_test.sparse_coo2crs
23:40:11 /home/jenkins/blake-new/workspace/KokkosKernels_Nightly_Blake_OneAPI_2023_1_0_Sycl_PV/kokkos-kernels/test_common/KokkosKernels_TestUtils.hpp:159: Failure
23:40:11 Expected: ((double)AT1::abs(val1 - val2)) <= ((double)AT3::abs(tol)), actual: 1.37386 vs 3.71438e-09
23:40:11 row: 31, crs_col_ids_ref(2871) = 25 mismatched values!
23:40:11 Begin arguments for above failure...
23:40:11 RandCooMat<N6Kokkos7complexIdEE, N6Kokkos10LayoutLeftE, N6Kokkos12Experimental4SYCLE409...): rand seed: 990578628
23:40:11 scalar: N6Kokkos7complexIdEE
23:40:11 layout: N6Kokkos10LayoutLeftE
23:40:11 m: 409, n: 409
23:40:11 ...end arguments for above failure.
23:40:11
23:40:11 /home/jenkins/blake-new/workspace/KokkosKernels_Nightly_Blake_OneAPI_2023_1_0_Sycl_PV/kokkos-kernels/test_common/KokkosKernels_TestUtils.hpp:159: Failure
23:40:11 Expected: ((double)AT1::abs(val1 - val2)) <= ((double)AT3::abs(tol)), actual: 3.55771 vs 3.71438e-09
23:40:11 row: 37, crs_col_ids_ref(3420) = 213 mismatched values!
23:40:11 Begin arguments for above failure...
23:40:11 RandCooMat<N6Kokkos7complexIdEE, N6Kokkos10LayoutLeftE, N6Kokkos12Experimental4SYCLE409...): rand seed: 990578628
23:40:11 scalar: N6Kokkos7complexIdEE
23:40:11 layout: N6Kokkos10LayoutLeftE
23:40:11 m: 409, n: 409
23:40:11 ...end arguments for above failure.
Testing with the Sycl backend on Intel Ponte Vecchio on the new Blake showed a couple failing sub-tests (failure output listed below the failing executable), depending on which environment variables set:
Default (
ZES_ENABLE_SYSMAN
unset)ZES_ENABLE_SYSMAN=1
Reproducer (Blake PV queue): SHAs: kokkos/kokkos@7e299b4e25c42528e105379c3aa9a318056545ba kokkos/kokkos-kernels@acdd8969109b53f2b3b0915ef51aef9800a44587
Edit: Added shas used in the testing