lightvector / KataGo

GTP engine and self-play learning in Go
https://katagotraining.org/
Other
3.59k stars 568 forks source link

CL_BUILD_PROGRAM_FAILURE and error: Double type is not supported on this platform; with Intel GPU #813

Open alan-sorani opened 1 year ago

alan-sorani commented 1 year ago

Hello.

I've built KataGo as instructed and got the following result when running ./katago benchmark. My GPU is Intel Iris Xe Graphics.

Edit: Forgot to mention possibly relevant info: I'm on Ubuntu 22.04.2 x86_64 and my CPU is 11th Gen Intel i5-1135G7.

2023-07-04 16:22:59+0300: Running with following config:
allowResignation = true
lagBuffer = 1.0
logAllGTPCommunication = true
logDir = gtp_logs
logSearchInfo = true
logToStderr = false
maxTimePondering = 60.0
maxVisits = 500
numSearchThreads = 6
ponderingEnabled = false
resignConsecTurns = 3
resignThreshold = -0.90
rules = tromp-taylor
searchFactorAfterOnePass = 0.50
searchFactorAfterTwoPass = 0.25
searchFactorWhenWinning = 0.40
searchFactorWhenWinningThreshold = 0.95

2023-07-04 16:22:59+0300: Loading model and initializing benchmark...
2023-07-04 16:22:59+0300: Testing with default positions for board size: 19
2023-07-04 16:22:59+0300: nnRandSeed0 = 1985006673290476099
2023-07-04 16:22:59+0300: After dedups: nnModelFile0 = /home/lemon/Library/games/Go/Engines/katago_v1.13.2/cpp/default_model.bin.gz useFP16 auto useNHWC auto
2023-07-04 16:22:59+0300: Initializing neural net buffer to be size 19 * 19 exactly
2023-07-04 16:23:00+0300: Found OpenCL Platform 0: Intel(R) OpenCL Graphics (Intel(R) Corporation) (OpenCL 3.0 )
2023-07-04 16:23:00+0300: Found 1 device(s) on platform 0 with type CPU or GPU or Accelerator
2023-07-04 16:23:00+0300: Found OpenCL Device 0: Intel(R) Iris(R) Xe Graphics (Intel(R) Corporation) (score 6000300)
2023-07-04 16:23:00+0300: Creating context for OpenCL Platform: Intel(R) OpenCL Graphics (Intel(R) Corporation) (OpenCL 3.0 )
2023-07-04 16:23:00+0300: Using OpenCL Device 0: Intel(R) Iris(R) Xe Graphics (Intel(R) Corporation) OpenCL 3.0 NEO  (Extensions: cl_khr_byte_addressable_store cl_khr_device_uuid cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_command_queue_families cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_il_program cl_intel_mem_force_host_memory cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_subgroup_clustered_reduce cl_intel_device_attribute_query cl_khr_suggested_local_work_size cl_intel_split_work_group_barrier cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_ext_float_atomics cl_intel_planar_yuv cl_intel_packed_yuv cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_3d_image_writes cl_intel_media_block_io cl_intel_subgroup_local_block_io cl_khr_integer_dot_product cl_khr_gl_sharing cl_khr_gl_depth_images cl_khr_gl_event cl_khr_gl_msaa_sharing cl_intel_va_api_media_sharing cl_intel_sharing_format_query cl_khr_pci_bus_info )
2023-07-04 16:23:00+0300: Loaded tuning parameters from: /home/lemon/.katago/opencltuning/tune11_gpuIntelRIrisRXeGraphics_x19_y19_c384_mv14.txt
terminate called after throwing an instance of 'OpenCLHelpers::CompileError'
  what():  CL_BUILD_PROGRAM_FAILURE
BUILD LOG FOR winogradConv3x3NCHWBNMishTransformProgram ON DEVICE 0

error: Double type is not supported on this platform.
in kernel: 'bnActTransform'
error: backend compiler failed build.

error: Double type is not supported on this platform.
in kernel: 'bnActTransform'
error: backend compiler failed build.

error: Double type is not supported on this platform.
in kernel: 'bnActTransform'
error: backend compiler failed build.

error: Double type is not supported on this platform.
in kernel: 'bnActTransform'
error: backend compiler failed build.

error: Double type is not supported on this platform.
in kernel: 'bnActTransform'
error: backend compiler failed build.

error: Double type is not supported on this platform.
in kernel: 'bnActTransform'
error: backend compiler failed build.

error: Double type is not supported on this platform.
in kernel: 'bnActTransform'
error: backend compiler failed build.

error: Double type is not supported on this platform.
in kernel: 'bnActTransform'
error: backend compiler failed build.

error: Double type is not supported on this platform.
in kernel: 'bnActTransform'
error: backend compiler failed build.

error: Double type is not supported on this platform.
in kernel: 'bnActTransform'
error: backend compiler failed build.

error: Double type is not supported on this platform.
in kernel: 'bnActTransform'
error: backend compiler failed build.

error: Double type is not supported on this platform.
in kernel: 'bnActTransform'
error: backend compiler failed build.

error: Double type is not supported on this platform.
in kernel: 'bnActTransform'
error: backend compiler failed build.

error: Double type is not supported on this platform.
in kernel: 'bnActTransform'
error: backend compiler failed build.

Aborted (core dumped)

I managed to make KataGo work on the same computer (and GPU) previously. I'm not entirely sure if the error is due to the current OpenCl provider from Intel (that I have from https://github.com/intel/compute-runtime/releases), a mistake on my part or something else.

AS

Possibly relevant OpenCl platform info from clinfo:

Number of platforms                               1
  Platform Name                                   Intel(R) OpenCL Graphics
  Platform Vendor                                 Intel(R) Corporation
  Platform Version                                OpenCL 3.0 
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_byte_addressable_store cl_khr_device_uuid cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_command_queue_families cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_il_program cl_intel_mem_force_host_memory cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_subgroup_clustered_reduce cl_intel_device_attribute_query cl_khr_suggested_local_work_size cl_intel_split_work_group_barrier cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_ext_float_atomics cl_intel_planar_yuv cl_intel_packed_yuv cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_3d_image_writes cl_intel_media_block_io cl_intel_subgroup_local_block_io cl_khr_integer_dot_product cl_khr_gl_sharing cl_khr_gl_depth_images cl_khr_gl_event cl_khr_gl_msaa_sharing cl_intel_va_api_media_sharing cl_intel_sharing_format_query cl_khr_pci_bus_info 
  Platform Extensions with Version                cl_khr_byte_addressable_store                                    0x400000 (1.0.0)
                                                  cl_khr_device_uuid                                               0x400000 (1.0.0)
                                                  cl_khr_fp16                                                      0x400000 (1.0.0)
                                                  cl_khr_global_int32_base_atomics                                 0x400000 (1.0.0)
                                                  cl_khr_global_int32_extended_atomics                             0x400000 (1.0.0)
                                                  cl_khr_icd                                                       0x400000 (1.0.0)
                                                  cl_khr_local_int32_base_atomics                                  0x400000 (1.0.0)
                                                  cl_khr_local_int32_extended_atomics                              0x400000 (1.0.0)
                                                  cl_intel_command_queue_families                                  0x400000 (1.0.0)
                                                  cl_intel_subgroups                                               0x400000 (1.0.0)
                                                  cl_intel_required_subgroup_size                                  0x400000 (1.0.0)
                                                  cl_intel_subgroups_short                                         0x400000 (1.0.0)
                                                  cl_khr_spir                                                      0x400000 (1.0.0)
                                                  cl_intel_accelerator                                             0x400000 (1.0.0)
                                                  cl_intel_driver_diagnostics                                      0x400000 (1.0.0)
                                                  cl_khr_priority_hints                                            0x400000 (1.0.0)
                                                  cl_khr_throttle_hints                                            0x400000 (1.0.0)
                                                  cl_khr_create_command_queue                                      0x400000 (1.0.0)
                                                  cl_intel_subgroups_char                                          0x400000 (1.0.0)
                                                  cl_intel_subgroups_long                                          0x400000 (1.0.0)
                                                  cl_khr_il_program                                                0x400000 (1.0.0)
                                                  cl_intel_mem_force_host_memory                                   0x400000 (1.0.0)
                                                  cl_khr_subgroup_extended_types                                   0x400000 (1.0.0)
                                                  cl_khr_subgroup_non_uniform_vote                                 0x400000 (1.0.0)
                                                  cl_khr_subgroup_ballot                                           0x400000 (1.0.0)
                                                  cl_khr_subgroup_non_uniform_arithmetic                           0x400000 (1.0.0)
                                                  cl_khr_subgroup_shuffle                                          0x400000 (1.0.0)
                                                  cl_khr_subgroup_shuffle_relative                                 0x400000 (1.0.0)
                                                  cl_khr_subgroup_clustered_reduce                                 0x400000 (1.0.0)
                                                  cl_intel_device_attribute_query                                  0x400000 (1.0.0)
                                                  cl_khr_suggested_local_work_size                                 0x400000 (1.0.0)
                                                  cl_intel_split_work_group_barrier                                0x400000 (1.0.0)
                                                  cl_intel_spirv_media_block_io                                    0x400000 (1.0.0)
                                                  cl_intel_spirv_subgroups                                         0x400000 (1.0.0)
                                                  cl_khr_spirv_no_integer_wrap_decoration                          0x400000 (1.0.0)
                                                  cl_intel_unified_shared_memory                                   0x400000 (1.0.0)
                                                  cl_khr_mipmap_image                                              0x400000 (1.0.0)
                                                  cl_khr_mipmap_image_writes                                       0x400000 (1.0.0)
                                                  cl_ext_float_atomics                                             0x400000 (1.0.0)
                                                  cl_intel_planar_yuv                                              0x400000 (1.0.0)
                                                  cl_intel_packed_yuv                                              0x400000 (1.0.0)
                                                  cl_khr_int64_base_atomics                                        0x400000 (1.0.0)
                                                  cl_khr_int64_extended_atomics                                    0x400000 (1.0.0)
                                                  cl_khr_image2d_from_buffer                                       0x400000 (1.0.0)
                                                  cl_khr_depth_images                                              0x400000 (1.0.0)
                                                  cl_khr_3d_image_writes                                           0x400000 (1.0.0)
                                                  cl_intel_media_block_io                                          0x400000 (1.0.0)
                                                  cl_intel_subgroup_local_block_io                                 0x400000 (1.0.0)
                                                  cl_khr_integer_dot_product                                       0x800000 (2.0.0)
                                                  cl_khr_gl_sharing                                                0x400000 (1.0.0)
                                                  cl_khr_gl_depth_images                                           0x400000 (1.0.0)
                                                  cl_khr_gl_event                                                  0x400000 (1.0.0)
                                                  cl_khr_gl_msaa_sharing                                           0x400000 (1.0.0)
                                                  cl_intel_va_api_media_sharing                                    0x400000 (1.0.0)
                                                  cl_intel_sharing_format_query                                    0x400000 (1.0.0)
                                                  cl_khr_pci_bus_info                                              0x400000 (1.0.0)
  Platform Numeric Version                        0xc00000 (3.0.0)
  Platform Extensions function suffix             INTEL
  Platform Host timer resolution                  1ns
lightvector commented 1 year ago

That's pretty weird, because I'm pretty sure KataGo doesn't use double-precision floats in the neural net. I looked over that kernel and didn't see anywhere where it's used, only single precision (or half precision) floats.

Does Katago v1.11.0 build and work for you? Maybe there's some weirdness about one of the new functions in the activation function KataGo is using being interpreted by Intel's opencl compiler as resolving to double precision (all the inputs should be still single precision though).

alan-sorani commented 1 year ago

With Katago v1.11.0 I cannot manage to build with cmake. Running cmake . −DUSE_BACKEND=OPENCL −DBUILD_DISTRIBUTED=1 I get the error Could NOT find OpenCL (missing: OPenCL_LIBRARY OpenCL_INCLUDE_DIR) as follows.

cmake . -DUSE_BACKEND=OPENCL -DBUILD_DISTRIBUTED=1 -DCMAKE_LIBRARY_PATH="/usr/lib/x86_64-linux-gnu"
-- The C compiler identification is GNU 11.3.0
-- The CXX compiler identification is GNU 11.3.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Building 'katago' executable for GTP engine and other tools.
-- -DUSE_BACKEND=OPENCL, using OpenCL backend.
-- Including Git revision in the compiled executable
-- Found Git: /usr/bin/git (found version "2.34.1") 
-- Looking for CL_VERSION_2_2
-- Looking for CL_VERSION_2_2 - not found
-- Looking for CL_VERSION_2_1
-- Looking for CL_VERSION_2_1 - not found
-- Looking for CL_VERSION_2_0
-- Looking for CL_VERSION_2_0 - not found
-- Looking for CL_VERSION_1_2
-- Looking for CL_VERSION_1_2 - not found
-- Looking for CL_VERSION_1_1
-- Looking for CL_VERSION_1_1 - not found
-- Looking for CL_VERSION_1_0
-- Looking for CL_VERSION_1_0 - not found
CMake Error at /usr/share/cmake-3.22/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
  Could NOT find OpenCL (missing: OpenCL_LIBRARY OpenCL_INCLUDE_DIR)
Call Stack (most recent call first):
  /usr/share/cmake-3.22/Modules/FindPackageHandleStandardArgs.cmake:594 (_FPHSA_FAILURE_MESSAGE)
  /usr/share/cmake-3.22/Modules/FindOpenCL.cmake:163 (find_package_handle_standard_args)
  CMakeLists.txt:289 (find_package)

-- Configuring incomplete, errors occurred!
See also "/home/lemon/Library/games/Go/Engines/KataGo-1.11.0/cpp/CMakeFiles/CMakeOutput.log".
See also "/home/lemon/Library/games/Go/Engines/KataGo-1.11.0/cpp/CMakeFiles/CMakeError.log".

I tried adding the tags -DCMAKE_LIBRARY_PATH="/usr/lib/x86_64-linux-gnu/intel-opencl/ or -DCMAKE_LIBRARY_PATH="/usr/lib/x86_64-linux-gnu/ since these are the directories in which I seem to have opencl files, but it didn't change anything.

Seeing that this issue seems unrelated, I tried the binary for v.1.11.0 instead. Here it couldn't find libzip.so.5 which seems to have been missing but which I later installed. Now it does find OpenCL, but gets CL_OUT_OF_HOST_MEMORY as follows.

2023-07-05 14:16:43+0300: Loading model and initializing benchmark...
2023-07-05 14:16:43+0300: Testing with default positions for board size: 19
2023-07-05 14:16:43+0300: nnRandSeed0 = 3084680675180672652
2023-07-05 14:16:43+0300: After dedups: nnModelFile0 = /home/lemon/Library/games/Go/Engines/katago_1.11.0/default_model.bin.gz useFP16 auto useNHWC auto
2023-07-05 14:16:43+0300: Initializing neural net buffer to be size 19 * 19 exactly
2023-07-05 14:16:45+0300: Found OpenCL Platform 0: Intel(R) OpenCL HD Graphics (Intel(R) Corporation) (OpenCL 3.0 )
2023-07-05 14:16:45+0300: Found 1 device(s) on platform 0 with type CPU or GPU or Accelerator
2023-07-05 14:16:45+0300: Found OpenCL Device 0: Intel(R) Iris(R) Xe Graphics [0x9a49] (Intel(R) Corporation) (score 6000300)
2023-07-05 14:16:45+0300: Creating context for OpenCL Platform: Intel(R) OpenCL HD Graphics (Intel(R) Corporation) (OpenCL 3.0 )
2023-07-05 14:16:45+0300: Using OpenCL Device 0: Intel(R) Iris(R) Xe Graphics [0x9a49] (Intel(R) Corporation) OpenCL 3.0 NEO  (Extensions: cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_command_queue_families cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_il_program cl_intel_mem_force_host_memory cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_subgroup_clustered_reduce cl_intel_device_attribute_query cl_khr_suggested_local_work_size cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_intel_planar_yuv cl_intel_packed_yuv cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_3d_image_writes cl_intel_media_block_io cl_intel_va_api_media_sharing cl_intel_sharing_format_query cl_khr_pci_bus_info cl_intel_subgroup_local_block_io )
2023-07-05 14:16:45+0300: Loaded tuning parameters from: /home/lemon/.katago/opencltuning/tune8_gpuIntelRIrisRXeGraphics0x9a49_x19_y19_c256_mv10.txt
terminate called after throwing an instance of 'OpenCLHelpers::CompileError'
  what():  CL_OUT_OF_HOST_MEMORY
BUILD LOG FOR conv2dNCHWProgram ON DEVICE 0

Aborted (core dumped)

Going back to Katago v.1.13.2 after installing libzip.so.5 (which might cause the current problem, I got it from the Ubuntu 20.04 repositories as described here) and libssl as its dependency, I get the following different error:

2023-07-05 14:19:01+0300: Running with following config:
allowResignation = true
lagBuffer = 1.0
logAllGTPCommunication = true
logDir = gtp_logs
logSearchInfo = true
logToStderr = false
maxTimePondering = 60.0
maxVisits = 500
numSearchThreads = 6
ponderingEnabled = false
resignConsecTurns = 3
resignThreshold = -0.90
rules = tromp-taylor
searchFactorAfterOnePass = 0.50
searchFactorAfterTwoPass = 0.25
searchFactorWhenWinning = 0.40
searchFactorWhenWinningThreshold = 0.95

2023-07-05 14:19:01+0300: Loading model and initializing benchmark...
2023-07-05 14:19:01+0300: Testing with default positions for board size: 19
2023-07-05 14:19:01+0300: nnRandSeed0 = 11756914437506952017
2023-07-05 14:19:01+0300: After dedups: nnModelFile0 = /home/lemon/Library/games/Go/Engines/katago_v1.13.2/cpp/default_model.bin.gz useFP16 auto useNHWC auto
2023-07-05 14:19:01+0300: Initializing neural net buffer to be size 19 * 19 exactly
2023-07-05 14:19:02+0300: Found OpenCL Platform 0: Intel(R) OpenCL HD Graphics (Intel(R) Corporation) (OpenCL 3.0 )
2023-07-05 14:19:02+0300: Found 1 device(s) on platform 0 with type CPU or GPU or Accelerator
2023-07-05 14:19:02+0300: Found OpenCL Device 0: Intel(R) Iris(R) Xe Graphics [0x9a49] (Intel(R) Corporation) (score 6000300)
2023-07-05 14:19:02+0300: Creating context for OpenCL Platform: Intel(R) OpenCL HD Graphics (Intel(R) Corporation) (OpenCL 3.0 )
2023-07-05 14:19:02+0300: Using OpenCL Device 0: Intel(R) Iris(R) Xe Graphics [0x9a49] (Intel(R) Corporation) OpenCL 3.0 NEO  (Extensions: cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_command_queue_families cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_il_program cl_intel_mem_force_host_memory cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_subgroup_clustered_reduce cl_intel_device_attribute_query cl_khr_suggested_local_work_size cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_intel_planar_yuv cl_intel_packed_yuv cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_3d_image_writes cl_intel_media_block_io cl_intel_va_api_media_sharing cl_intel_sharing_format_query cl_khr_pci_bus_info cl_intel_subgroup_local_block_io )
2023-07-05 14:19:02+0300: No existing tuning parameters found or parseable or valid at: /home/lemon/.katago/opencltuning/tune11_gpuIntelRIrisRXeGraphics0x9a49_x19_y19_c384_mv14.txt
2023-07-05 14:19:02+0300: Performing autotuning
2023-07-05 14:19:02+0300: *** On some systems, this may take several minutes, please be patient ***
2023-07-05 14:19:02+0300: Found OpenCL Platform 0: Intel(R) OpenCL HD Graphics (Intel(R) Corporation) (OpenCL 3.0 )
2023-07-05 14:19:02+0300: Found 1 device(s) on platform 0 with type CPU or GPU or Accelerator
2023-07-05 14:19:02+0300: Found OpenCL Device 0: Intel(R) Iris(R) Xe Graphics [0x9a49] (Intel(R) Corporation) (score 6000300)
2023-07-05 14:19:02+0300: Creating context for OpenCL Platform: Intel(R) OpenCL HD Graphics (Intel(R) Corporation) (OpenCL 3.0 )
2023-07-05 14:19:02+0300: Using OpenCL Device 0: Intel(R) Iris(R) Xe Graphics [0x9a49] (Intel(R) Corporation) OpenCL 3.0 NEO  (Extensions: cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_command_queue_families cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_il_program cl_intel_mem_force_host_memory cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_subgroup_clustered_reduce cl_intel_device_attribute_query cl_khr_suggested_local_work_size cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_intel_planar_yuv cl_intel_packed_yuv cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_3d_image_writes cl_intel_media_block_io cl_intel_va_api_media_sharing cl_intel_sharing_format_query cl_khr_pci_bus_info cl_intel_subgroup_local_block_io )
Beginning GPU tuning for Intel(R) Iris(R) Xe Graphics [0x9a49] modelVersion 14 channels 384
2023-07-05 14:19:02+0300: Dummy tuning thread starting
2023-07-05 14:19:02+0300: Creating context for OpenCL Platform: Intel(R) OpenCL HD Graphics (Intel(R) Corporation) (OpenCL 3.0 )
2023-07-05 14:19:02+0300: Using OpenCL Device 0: Intel(R) Iris(R) Xe Graphics [0x9a49] (Intel(R) Corporation) OpenCL 3.0 NEO  (Extensions: cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_command_queue_families cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_il_program cl_intel_mem_force_host_memory cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_subgroup_clustered_reduce cl_intel_device_attribute_query cl_khr_suggested_local_work_size cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_intel_planar_yuv cl_intel_packed_yuv cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_3d_image_writes cl_intel_media_block_io cl_intel_va_api_media_sharing cl_intel_sharing_format_query cl_khr_pci_bus_info cl_intel_subgroup_local_block_io )
2023-07-05 14:19:02+0300: WARNING: Dummy thread to load the GPU while tuning failed
2023-07-05 14:19:02+0300: Compile error: CL_OUT_OF_HOST_MEMORY
BUILD LOG FOR xgemmDirectProgram ON DEVICE 0

Setting winograd3x3TileSize = 4
------------------------------------------------------
Tuning xGemmDirect for 1x1 convolutions and matrix mult
Testing 55 different configs
WARNING: Reference implementation failed: CL_BUILD_PROGRAM_FAILURE
Tuning 20/55 ...
Tuning 40/55 ...
ERROR: Could not find any configuration that worked
------------------------------------------------------
Tuning xGemm for convolutions
Testing 69 different configs
WARNING: Reference implementation failed: CL_BUILD_PROGRAM_FAILURE
Tuning 20/69 ...
Tuning 40/69 ...
Tuning 60/69 ...
ERROR: Could not find any configuration that worked
------------------------------------------------------
Tuning hGemmWmma for convolutions
Testing 144 different configs
FP16 tensor core tuning failed, assuming no FP16 tensor core support
------------------------------------------------------
Tuning hGemmWmmaNCHW for 1x1 convolutions
Testing 108 different configs
FP16 tensor core tuning failed for 1x1 convs
------------------------------------------------------
Tuning xGemm16 for convolutions
Testing 69 different configs
FP16 compute tuning failed, assuming no FP16 compute support
------------------------------------------------------
Tuning xGemm for convolutions - trying with FP16 storage
Testing 69 different configs
FP16 storage tuning failed, assuming no FP16 storage support
------------------------------------------------------
Using FP32 storage!
Using FP32 compute!
------------------------------------------------------
Tuning winograd transform for convolutions
Testing 45 different configs
WARNING: Reference implementation failed: CL_BUILD_PROGRAM_FAILURE
Tuning 20/45 ...
Tuning 40/45 ...
ERROR: Could not find any configuration that worked
------------------------------------------------------
Tuning winograd untransform for convolutions
Testing 109 different configs
WARNING: Reference implementation failed: CL_BUILD_PROGRAM_FAILURE
Tuning 20/109 ...
Tuning 40/109 ...
Tuning 60/109 ...
Tuning 80/109 ...
Tuning 100/109 ...
ERROR: Could not find any configuration that worked
------------------------------------------------------
Tuning global pooling strides
Testing 104 different configs
WARNING: Reference implementation failed: CL_BUILD_PROGRAM_FAILURE
Tuning 20/104 ...
Tuning 40/104 ...
Tuning 60/104 ...
Tuning 80/104 ...
Tuning 100/104 ...
ERROR: Could not find any configuration that worked
Done tuning
------------------------------------------------------
2023-07-05 14:19:02+0300: Done tuning, saved results to /home/lemon/.katago/opencltuning/tune11_gpuIntelRIrisRXeGraphics0x9a49_x19_y19_c384_mv14.txt
terminate called after throwing an instance of 'OpenCLHelpers::CompileError'
  what():  CL_OUT_OF_HOST_MEMORY
BUILD LOG FOR conv2dNCHWProgram ON DEVICE 0

Aborted (core dumped)

For reference, I get similar CL_OUT_OF_HOST_MEMORY errors on the binaries for both versions of KataGo, and on trying to run an updated version of Katrain.

Trying to see if the new libzip.so.5 affects cmake for Katago v.1.13.2, I get the following error on another instance of v.1.13.2:

-- The C compiler identification is GNU 11.3.0
-- The CXX compiler identification is GNU 11.3.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Building 'katago' executable for GTP engine and other tools.
-- -DUSE_BACKEND=OPENCL, using OpenCL backend.
-- Including Git revision in the compiled executable
-- Found Git: /usr/bin/git (found version "2.34.1") 
-- Looking for CL_VERSION_2_2
-- Looking for CL_VERSION_2_2 - not found
-- Looking for CL_VERSION_2_1
-- Looking for CL_VERSION_2_1 - not found
-- Looking for CL_VERSION_2_0
-- Looking for CL_VERSION_2_0 - not found
-- Looking for CL_VERSION_1_2
-- Looking for CL_VERSION_1_2 - not found
-- Looking for CL_VERSION_1_1
-- Looking for CL_VERSION_1_1 - not found
-- Looking for CL_VERSION_1_0
-- Looking for CL_VERSION_1_0 - not found
-- Could NOT find OpenCL (missing: OpenCL_LIBRARY OpenCL_INCLUDE_DIR) 
CMake Warning at CMakeLists.txt:312 (message):
  OpenCL not found, attempting to see if CUDA exists and has OpenCL since
  sometimes CUDA may provide OpenCL where cmake can't find it.

-- Could not find nvcc, please set CUDAToolkit_ROOT.
CMake Error at CMakeLists.txt:315 (message):
  OpenCL installation not found

-- Configuring incomplete, errors occurred!
See also "/home/lemon/Library/games/Go/Engines/katago_v.1.13.2_NEW/cpp/CMakeFiles/CMakeOutput.log".
See also "/home/lemon/Library/games/Go/Engines/katago_v.1.13.2_NEW/cpp/CMakeFiles/CMakeError.log".

Sorry for not being sure about different stuff I might do incorrectly.

I'll try rebuilding on my arch VM [I forgot that I can't allocate GPU to VBox so it doesn't work] dual-booting arch and building KataGo there to see if the problem persists.

lightvector commented 1 year ago

Given that all these things aren't working, what was the way that you did get it working before?

Yeah, intel GPUs can be tricky sometimes, and sometimes have issues with OpenCL. I wish there were better ways of making it work. How fast is the pure CPU version for you? (eigen).

Yodo13 commented 1 year ago

I have the same CPU and when compiling KataGo go the same error message: "Could NOT find OpenCL (missing: OpenCL_LIBRARY OpenCL_INCLUDE_DIR)".

After installing the package ocl-icd-opencl-dev, KataGo compiled fine.

This is not a KataGo issue, and I think that it can be closed.

PS. Tried the eigen version too, it was 10x slower...