kokkos / kokkos

Kokkos C++ Performance Portability Programming Ecosystem: The Programming Model - Parallel Execution and Memory Abstraction
https://kokkos.org
Other
1.81k stars 412 forks source link

Failing Kokkos::SYCL Unit Tests #7007

Open pvelesko opened 2 months ago

pvelesko commented 2 months ago

I see lots of timeouts and failure to find kernel

29% tests passed, 37 tests failed out of 52

Total Test time (real) = 1800.46 sec

The following tests FAILED:
      1 - Kokkos_CoreUnitTest_Serial1 (Failed)
      4 - Kokkos_CoreUnitTest_SYCL1A (Timeout)
      5 - Kokkos_CoreUnitTest_SYCL1B (Timeout)
      6 - Kokkos_CoreUnitTest_SYCL2A (Timeout)
      7 - Kokkos_CoreUnitTest_SYCL2B (Timeout)
      8 - Kokkos_CoreUnitTest_SYCL2C (Failed)
      9 - Kokkos_CoreUnitTest_SYCL2D (Failed)
     10 - Kokkos_CoreUnitTest_SYCL3 (Timeout)
     11 - Kokkos_CoreUnitTest_SYCLInterOpInit (Failed)
     12 - Kokkos_CoreUnitTest_SYCLInterOpInit_Context (Failed)
     13 - Kokkos_CoreUnitTest_SYCLInterOpStreams (Failed)
     14 - Kokkos_CoreUnitTest_Default (Timeout)
     15 - Kokkos_CoreUnitTest_LegionInitialization (Failed)
     18 - Kokkos_CoreUnitTest_KokkosP (Failed)
     26 - Kokkos_IncrementalTest_SYCL (Timeout)
     31 - Kokkos_ContainersUnitTest_SYCL (Timeout)
     32 - Kokkos_UnitTest_Sort (Timeout)
     33 - Kokkos_UnitTest_Random (Failed)
     34 - Kokkos_AlgorithmsUnitTest_StdSet_A (Timeout)
     35 - Kokkos_AlgorithmsUnitTest_StdSet_B (Timeout)
     36 - Kokkos_AlgorithmsUnitTest_StdSet_C (Timeout)
     37 - Kokkos_AlgorithmsUnitTest_StdSet_D (Timeout)
     38 - Kokkos_AlgorithmsUnitTest_StdSet_E (Timeout)
     39 - Kokkos_AlgorithmsUnitTest_StdSet_Team_A (Timeout)
     40 - Kokkos_AlgorithmsUnitTest_StdSet_Team_B (Timeout)
     41 - Kokkos_AlgorithmsUnitTest_StdSet_Team_C (Timeout)
     42 - Kokkos_AlgorithmsUnitTest_StdSet_Team_D (Failed)
     43 - Kokkos_AlgorithmsUnitTest_StdSet_Team_E (Timeout)
     44 - Kokkos_AlgorithmsUnitTest_StdSet_Team_F (Failed)
     45 - Kokkos_AlgorithmsUnitTest_StdSet_Team_G (Failed)
     46 - Kokkos_AlgorithmsUnitTest_StdSet_Team_H (Timeout)
     47 - Kokkos_AlgorithmsUnitTest_StdSet_Team_I (Failed)
     48 - Kokkos_AlgorithmsUnitTest_StdSet_Team_L (Timeout)
     49 - Kokkos_AlgorithmsUnitTest_StdSet_Team_M (Timeout)
     50 - Kokkos_AlgorithmsUnitTest_StdSet_Team_P (Failed)
     51 - Kokkos_AlgorithmsUnitTest_StdSet_Team_Q (Failed)
     52 - Kokkos_UnitTest_SIMD (Failed)

Please include the following for a minimal reproducer

  1. Compilers (with versions) OneAPI 2024.1 icpx

  2. Kokkos release or commit used (i.e., the sha1 number) tag 4.3.0

  3. Platform, architecture and backend Intel A770 Discrete GPU

  4. CMake configure command

    
    export KOKKOS_DIR=~/kokkos-build/kokkos
    export KOKKOS_KERNELS_DIR=~/kokkos-build/kokkos-kernels
    export KOKKOS_VER=4.3.00
    export ONEAPI_VER=2024.1.0
    export PREFIX=/space/pvelesko/install/kokkos/${KOKKOS_VER}/oneapi/$ONEAPI_VER
    module purge
    module load oneapi/$ONEAPI_VER

rm -rf ${KOKKOS_DIR}/build && mkdir -p ${KOKKOS_DIR}/build && cd ${KOKKOS_DIR}/build && rm -f CMakeCache.txt git checkout HEAD -f && git checkout ${KOKKOS_VER} cmake -DKokkos_ENABLE_SYCL=ON \ -DCMAKE_CXX_COMPILER=icpx \ -DBUILD_SHARED_LIBS=ON \ -DCMAKE_BUILD_TYPE=RelWithDebInfo \ -DKokkos_ENABLE_TESTS=ON \ -DCMAKE_INSTALL_PREFIX=${PREFIX} .. ninja install


9. Output from CMake configure command

─pvelesko@cupcake ~/kokkos-build/kokkos/build ‹4.3.00●› ╰─$ export KOKKOS_DIR=~/kokkos-build/kokkos export KOKKOS_KERNELS_DIR=~/kokkos-build/kokkos-kernels export KOKKOS_VER=4.3.00 export ONEAPI_VER=2024.1.0 export PREFIX=/space/pvelesko/install/kokkos/${KOKKOS_VER}/oneapi/$ONEAPI_VER module purge module load oneapi/$ONEAPI_VER

rm -rf ${KOKKOS_DIR}/build && mkdir -p ${KOKKOS_DIR}/build && cd ${KOKKOS_DIR}/build && rm -f CMakeCache.txt git checkout HEAD -f && git checkout ${KOKKOS_VER} cmake -DKokkos_ENABLE_SYCL=ON \ -DCMAKE_CXX_COMPILER=icpx \ -DBUILD_SHARED_LIBS=ON \ -DCMAKE_BUILD_TYPE=RelWithDebInfo \ -DKokkos_ENABLE_TESTS=ON \ -DCMAKE_INSTALL_PREFIX=${PREFIX} .. Loading oneapi/2024.1.0 Loading requirement: opencl/ocl-icd-loader HEAD is now at 486cc745c Merge pull request #6908 from ndellingwood/master-release-4.3.00 -- Setting default Kokkos CXX standard to 17 -- The CXX compiler identification is IntelLLVM 2024.1.0 -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /home/pvelesko/miniconda3/envs/oneapi-2024.1.0/bin/icpx - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Kokkos version: 4.3.0 -- The project name is: Kokkos -- Using gtest found in /usr/lib/x86_64-linux-gnu/cmake/GTest -- Configured git information in /home/pvelesko/kokkos-build/kokkos/build/generated/Kokkos_Version_Info.cpp -- SERIAL backend is being turned on to ensure there is at least one Host space. To change this, you must enable another host execution space and configure with -DKokkos_ENABLE_SERIAL=OFF or change CMakeCache.txt -- Using -std=gnu++17 for C++17 extensions as feature -- Looking for SYCL_EXT_ONEAPI_DEVICE_GLOBAL -- Looking for SYCL_EXT_ONEAPI_DEVICE_GLOBAL - found -- Built-in Execution Spaces: -- Device Parallel: Kokkos::Experimental::SYCL -- Host Parallel: NoTypeDefined -- Host Serial: SERIAL

-- Architectures: -- Found TPLLIBDL: /usr/include -- Looking for C++ include oneapi/dpl/execution -- Looking for C++ include oneapi/dpl/execution - found -- Looking for C++ include oneapi/dpl/algorithm -- Looking for C++ include oneapi/dpl/algorithm - found -- Performing Test KOKKOS_NO_TBB_CONFLICT -- Performing Test KOKKOS_NO_TBB_CONFLICT - Success -- Using internal desul_atomics copy -- Found Python3: /usr/bin/python3.10 (found version "3.10.12") found components: Interpreter -- Kokkos Backends: SERIAL;SYCL -- Configuring done -- Generating done -- Build files have been written to: /home/pvelesko/kokkos-build/kokkos/build


11. Minimum, complete code needed to reproduce the bug
12. Command line needed to reproduce the bug
13. `KokkosCore_config.h` header file (generated during the build)
[KokkosCore_config.txt](https://github.com/kokkos/kokkos/files/15295728/KokkosCore_config.txt)

14. Please provide any additional relevant error logs
[LastTest.txt](https://github.com/kokkos/kokkos/files/15295879/LastTest.txt)
ajpowelsnl commented 2 months ago

@pvelesko - what is the full output when you run a single, failing test (e.g., Kokkos_CoreUnitTest_Serial1) ?

masterleinad commented 2 months ago

What do the results look like if you explicitly provide the target architecture?

pvelesko commented 2 months ago

@pvelesko - what is the full output when you run a single, failing test (e.g., Kokkos_CoreUnitTest_Serial1) ?

I provided the full log output in the original post.

What do the results look like if you explicitly provide the target architecture?

With -DKokkos_ARCH_INTEL_GEN=ON - This flag seems to help a lot but I would have assumed that JIT is default. Not sure what Kokkos tries to do when this flag is not explicitly specified?

83% tests passed, 9 tests failed out of 52

Total Test time (real) = 628.25 sec

The following tests FAILED:
      1 - Kokkos_CoreUnitTest_Serial1 (Failed)
      4 - Kokkos_CoreUnitTest_SYCL1A (Subprocess aborted)
      5 - Kokkos_CoreUnitTest_SYCL1B (Failed)
      6 - Kokkos_CoreUnitTest_SYCL2A (Failed)
     10 - Kokkos_CoreUnitTest_SYCL3 (Failed)
     14 - Kokkos_CoreUnitTest_Default (Timeout)
     29 - Kokkos_CoreUnitTest_DeviceAndThreads (Failed)
     31 - Kokkos_ContainersUnitTest_SYCL (Subprocess killed)
     47 - Kokkos_AlgorithmsUnitTest_StdSet_Team_I (Failed)

LastTest-DDKokkos_ARCH_INTEL_GEN.txt

Tried Kokkos_ARCH_INTEL_XEHP but it failed to compile

Could not determine device target: 12.50.4.
Error: Cannot get HW Info for device 12.50.4.

A770 is Xe HPG - not sure if that was correct. https://www.intel.com/content/www/us/en/products/sku/229151/intel-arc-a770-graphics-16gb/specifications.html

Using Kokkos_ARCH_INTEL_DG1 was compiling but after multiple hours it seems to have stopped making progress.. or is taking forever.

Build succeeded.
[778/779] Linking CXX executable core/unit_test/Kokkos_CoreUnitTest_SYCL1A
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
masterleinad commented 2 months ago

With -DKokkos_ARCH_INTEL_GEN=ON - This flag seems to help a lot but I would have assumed that JIT is default. Not sure what Kokkos tries to do when this flag is not explicitly specified?

It's not passing a flag which should be equivalent to requesting JIT compilation (where the latter is explicit about compiling to SPIR-V).

Tried Kokkos_ARCH_INTEL_XEHP but it failed to compile

We might need to update the flags then. The ones we are using were recommended on the testbeds some time ago but I guess the AOT compiler can now handle named options. For reference, the A750 seems to be a dg2(-g12?) in the xe-hpg family in ocloc speak, see https://github.com/intel/compute-runtime/blob/014720fc29c59432188a49bebe1aec5aecb5d4f0/shared/source/dll/devices/devices_base.inl#L56.

masterleinad commented 2 months ago

LastTest-DDKokkos_ARCH_INTEL_GEN.txt

The list of failing tests is:

[ FAILED ] serial.atomic_operations_double (0 ms) [ FAILED ] serial.atomic_operations_float (0 ms) [ FAILED ] 2 tests, listed below: [ FAILED ] serial.atomic_operations_double [ FAILED ] serial.atomic_operations_float 2 FAILED TESTS Error regular expression found in output. Regex=[ FAILED ] [ FAILED ] sycl.atomic_operations_complexdouble (2481 ms) [ FAILED ] sycl.atomic_operations_double (2344 ms) [ FAILED ] sycl.atomic_operations_float (3 ms) Error regular expression found in output. Regex=[ FAILED ] [ FAILED ] sycl_host_usm.view_allocation_large_rank (0 ms) [ FAILED ] 1 test, listed below: [ FAILED ] sycl_host_usm.view_allocation_large_rank 1 FAILED TEST Error regular expression found in output. Regex=[ FAILED ] FAILED (errors=1, skipped=1) [ FAILED ] sycl.reducers_int8_t (9 ms) [ FAILED ] sycl.reducers_point_t (6 ms) [ FAILED ] sycl.reducers_bool (1052 ms) [ FAILED ] sycl.int_combined_reduce (1924 ms) [ FAILED ] sycl.mdrange_combined_reduce (0 ms) [ FAILED ] sycl.int_combined_reduce_mixed (0 ms) [ FAILED ] sycl.reduction_deduction (0 ms) [ FAILED ] sycl.reduce_device_view_range_policy (7283 ms) [ FAILED ] sycl.reduce_device_view_mdrange_policy (2361 ms) [ FAILED ] sycl.reduce_device_view_team_policy (2268 ms) [ FAILED ] 10 tests, listed below: [ FAILED ] sycl.reducers_int8_t [ FAILED ] sycl.reducers_point_t [ FAILED ] sycl.reducers_bool [ FAILED ] sycl.int_combined_reduce [ FAILED ] sycl.mdrange_combined_reduce [ FAILED ] sycl.int_combined_reduce_mixed [ FAILED ] sycl.reduction_deduction [ FAILED ] sycl.reduce_device_view_range_policy [ FAILED ] sycl.reduce_device_view_mdrange_policy [ FAILED ] sycl.reduce_device_view_team_policy 10 FAILED TESTS Error regular expression found in output. Regex=[ FAILED ] [ FAILED ] sycl.TeamThreadMDRangeParallelReduce (21 ms) [ FAILED ] sycl.ThreadVectorMDRangeParallelReduce (13 ms) [ FAILED ] sycl.TeamVectorMDRangeParallelReduce (13 ms) [ FAILED ] sycl.multi_level_scratch (3504 ms) FAILED teamvector_parallel_reduce 0 0 54103.000000 0.000000 24 FAILED teamvector_parallel_reduce with shared result 0 0 54103.000000 0.000000 24 [ FAILED ] sycl.team_teamvector_range (3175 ms) [ FAILED ] sycl.view_allocation_large_rank (0 ms) [ FAILED ] 6 tests, listed below: [ FAILED ] sycl.TeamThreadMDRangeParallelReduce [ FAILED ] sycl.ThreadVectorMDRangeParallelReduce [ FAILED ] sycl.TeamVectorMDRangeParallelReduce [ FAILED ] sycl.multi_level_scratch [ FAILED ] sycl.team_teamvector_range [ FAILED ] sycl.view_allocation_large_rank 6 FAILED TESTS Error regular expression found in output. Regex=[ FAILED ] [ FAILED ] std_algorithms_reduce_team_test.test (5732 ms) [ FAILED ] std_algorithms_transform_reduce_team_test.test (5333 ms) [ FAILED ] 2 tests, listed below: [ FAILED ] std_algorithms_reduce_team_test.test [ FAILED ] std_algorithms_transform_reduce_team_test.test 2 FAILED TESTS Error regular expression found in output. Regex=[ FAILED ]

So there seems to be some problems with shuffles and device_global variables. It's hard to look into it without access to that architecture, though.

pvelesko commented 2 months ago

So there seems to be some problems with shuffles and device_global variables. It's hard to look into it without access to that architecture, though.

It's my personal server, I can set you up with ssh if you'd like.

Also, I ran these tests on an iGPU which is available on the same system:

85% tests passed, 8 tests failed out of 52

Total Test time (real) = 731.19 sec

The following tests FAILED:
      1 - Kokkos_CoreUnitTest_Serial1 (Failed)
      4 - Kokkos_CoreUnitTest_SYCL1A (Subprocess aborted)
      6 - Kokkos_CoreUnitTest_SYCL2A (Failed)
     10 - Kokkos_CoreUnitTest_SYCL3 (Failed)
     14 - Kokkos_CoreUnitTest_Default (Timeout)
     29 - Kokkos_CoreUnitTest_DeviceAndThreads (Failed)
     38 - Kokkos_AlgorithmsUnitTest_StdSet_E (Subprocess aborted)
     47 - Kokkos_AlgorithmsUnitTest_StdSet_Team_I (Subprocess aborted)
pvelesko commented 2 months ago

@ajpowelsnl

Kokkos 4.1.00 + oneapi/2023.2.4 on Intel(R) UHD Graphics 770

75% tests passed, 10 tests failed out of 40

Total Test time (real) = 619.60 sec

The following tests FAILED:
          4 - Kokkos_CoreUnitTest_SYCL1A (Failed)
          6 - Kokkos_CoreUnitTest_SYCL2A (Timeout)
         10 - Kokkos_CoreUnitTest_SYCL3 (Failed)
         11 - Kokkos_CoreUnitTest_SYCLInterOpInit (Failed)
         12 - Kokkos_CoreUnitTest_SYCLInterOpInit_Context (Failed)
         13 - Kokkos_CoreUnitTest_SYCLInterOpStreams (Failed)
         14 - Kokkos_CoreUnitTest_Default (Timeout)
         32 - Kokkos_ContainersUnitTest_SYCL (Timeout)
         33 - Kokkos_UnitTest_Sort (Timeout)
         39 - Kokkos_AlgorithmsUnitTest_StdSet_E (Subprocess aborted)
pvelesko commented 1 month ago

Does anyone need access to the machine for debugging?

ajpowelsnl commented 1 month ago

@pvelesko -- were you able to sign up for the Kokkos Slack Channel? We have attempted to contact you there to address your HIP and SYCL issues.

pvelesko commented 1 month ago

@ajpowelsnl yes, I'm on a thread

image