Open alan-sorani opened 1 year ago
That's pretty weird, because I'm pretty sure KataGo doesn't use double-precision floats in the neural net. I looked over that kernel and didn't see anywhere where it's used, only single precision (or half precision) floats.
Does Katago v1.11.0 build and work for you? Maybe there's some weirdness about one of the new functions in the activation function KataGo is using being interpreted by Intel's opencl compiler as resolving to double precision (all the inputs should be still single precision though).
With Katago v1.11.0 I cannot manage to build with cmake
. Running cmake . −DUSE_BACKEND=OPENCL −DBUILD_DISTRIBUTED=1
I get the error Could NOT find OpenCL (missing: OPenCL_LIBRARY OpenCL_INCLUDE_DIR)
as follows.
cmake . -DUSE_BACKEND=OPENCL -DBUILD_DISTRIBUTED=1 -DCMAKE_LIBRARY_PATH="/usr/lib/x86_64-linux-gnu"
-- The C compiler identification is GNU 11.3.0
-- The CXX compiler identification is GNU 11.3.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Building 'katago' executable for GTP engine and other tools.
-- -DUSE_BACKEND=OPENCL, using OpenCL backend.
-- Including Git revision in the compiled executable
-- Found Git: /usr/bin/git (found version "2.34.1")
-- Looking for CL_VERSION_2_2
-- Looking for CL_VERSION_2_2 - not found
-- Looking for CL_VERSION_2_1
-- Looking for CL_VERSION_2_1 - not found
-- Looking for CL_VERSION_2_0
-- Looking for CL_VERSION_2_0 - not found
-- Looking for CL_VERSION_1_2
-- Looking for CL_VERSION_1_2 - not found
-- Looking for CL_VERSION_1_1
-- Looking for CL_VERSION_1_1 - not found
-- Looking for CL_VERSION_1_0
-- Looking for CL_VERSION_1_0 - not found
CMake Error at /usr/share/cmake-3.22/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
Could NOT find OpenCL (missing: OpenCL_LIBRARY OpenCL_INCLUDE_DIR)
Call Stack (most recent call first):
/usr/share/cmake-3.22/Modules/FindPackageHandleStandardArgs.cmake:594 (_FPHSA_FAILURE_MESSAGE)
/usr/share/cmake-3.22/Modules/FindOpenCL.cmake:163 (find_package_handle_standard_args)
CMakeLists.txt:289 (find_package)
-- Configuring incomplete, errors occurred!
See also "/home/lemon/Library/games/Go/Engines/KataGo-1.11.0/cpp/CMakeFiles/CMakeOutput.log".
See also "/home/lemon/Library/games/Go/Engines/KataGo-1.11.0/cpp/CMakeFiles/CMakeError.log".
I tried adding the tags -DCMAKE_LIBRARY_PATH="/usr/lib/x86_64-linux-gnu/intel-opencl/
or -DCMAKE_LIBRARY_PATH="/usr/lib/x86_64-linux-gnu/
since these are the directories in which I seem to have opencl files, but it didn't change anything.
Seeing that this issue seems unrelated, I tried the binary for v.1.11.0 instead. Here it couldn't find libzip.so.5
which seems to have been missing but which I later installed. Now it does find OpenCL, but gets CL_OUT_OF_HOST_MEMORY
as follows.
2023-07-05 14:16:43+0300: Loading model and initializing benchmark...
2023-07-05 14:16:43+0300: Testing with default positions for board size: 19
2023-07-05 14:16:43+0300: nnRandSeed0 = 3084680675180672652
2023-07-05 14:16:43+0300: After dedups: nnModelFile0 = /home/lemon/Library/games/Go/Engines/katago_1.11.0/default_model.bin.gz useFP16 auto useNHWC auto
2023-07-05 14:16:43+0300: Initializing neural net buffer to be size 19 * 19 exactly
2023-07-05 14:16:45+0300: Found OpenCL Platform 0: Intel(R) OpenCL HD Graphics (Intel(R) Corporation) (OpenCL 3.0 )
2023-07-05 14:16:45+0300: Found 1 device(s) on platform 0 with type CPU or GPU or Accelerator
2023-07-05 14:16:45+0300: Found OpenCL Device 0: Intel(R) Iris(R) Xe Graphics [0x9a49] (Intel(R) Corporation) (score 6000300)
2023-07-05 14:16:45+0300: Creating context for OpenCL Platform: Intel(R) OpenCL HD Graphics (Intel(R) Corporation) (OpenCL 3.0 )
2023-07-05 14:16:45+0300: Using OpenCL Device 0: Intel(R) Iris(R) Xe Graphics [0x9a49] (Intel(R) Corporation) OpenCL 3.0 NEO (Extensions: cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_command_queue_families cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_il_program cl_intel_mem_force_host_memory cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_subgroup_clustered_reduce cl_intel_device_attribute_query cl_khr_suggested_local_work_size cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_intel_planar_yuv cl_intel_packed_yuv cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_3d_image_writes cl_intel_media_block_io cl_intel_va_api_media_sharing cl_intel_sharing_format_query cl_khr_pci_bus_info cl_intel_subgroup_local_block_io )
2023-07-05 14:16:45+0300: Loaded tuning parameters from: /home/lemon/.katago/opencltuning/tune8_gpuIntelRIrisRXeGraphics0x9a49_x19_y19_c256_mv10.txt
terminate called after throwing an instance of 'OpenCLHelpers::CompileError'
what(): CL_OUT_OF_HOST_MEMORY
BUILD LOG FOR conv2dNCHWProgram ON DEVICE 0
Aborted (core dumped)
Going back to Katago v.1.13.2 after installing libzip.so.5 (which might cause the current problem, I got it from the Ubuntu 20.04 repositories as described here) and libssl as its dependency, I get the following different error:
2023-07-05 14:19:01+0300: Running with following config:
allowResignation = true
lagBuffer = 1.0
logAllGTPCommunication = true
logDir = gtp_logs
logSearchInfo = true
logToStderr = false
maxTimePondering = 60.0
maxVisits = 500
numSearchThreads = 6
ponderingEnabled = false
resignConsecTurns = 3
resignThreshold = -0.90
rules = tromp-taylor
searchFactorAfterOnePass = 0.50
searchFactorAfterTwoPass = 0.25
searchFactorWhenWinning = 0.40
searchFactorWhenWinningThreshold = 0.95
2023-07-05 14:19:01+0300: Loading model and initializing benchmark...
2023-07-05 14:19:01+0300: Testing with default positions for board size: 19
2023-07-05 14:19:01+0300: nnRandSeed0 = 11756914437506952017
2023-07-05 14:19:01+0300: After dedups: nnModelFile0 = /home/lemon/Library/games/Go/Engines/katago_v1.13.2/cpp/default_model.bin.gz useFP16 auto useNHWC auto
2023-07-05 14:19:01+0300: Initializing neural net buffer to be size 19 * 19 exactly
2023-07-05 14:19:02+0300: Found OpenCL Platform 0: Intel(R) OpenCL HD Graphics (Intel(R) Corporation) (OpenCL 3.0 )
2023-07-05 14:19:02+0300: Found 1 device(s) on platform 0 with type CPU or GPU or Accelerator
2023-07-05 14:19:02+0300: Found OpenCL Device 0: Intel(R) Iris(R) Xe Graphics [0x9a49] (Intel(R) Corporation) (score 6000300)
2023-07-05 14:19:02+0300: Creating context for OpenCL Platform: Intel(R) OpenCL HD Graphics (Intel(R) Corporation) (OpenCL 3.0 )
2023-07-05 14:19:02+0300: Using OpenCL Device 0: Intel(R) Iris(R) Xe Graphics [0x9a49] (Intel(R) Corporation) OpenCL 3.0 NEO (Extensions: cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_command_queue_families cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_il_program cl_intel_mem_force_host_memory cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_subgroup_clustered_reduce cl_intel_device_attribute_query cl_khr_suggested_local_work_size cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_intel_planar_yuv cl_intel_packed_yuv cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_3d_image_writes cl_intel_media_block_io cl_intel_va_api_media_sharing cl_intel_sharing_format_query cl_khr_pci_bus_info cl_intel_subgroup_local_block_io )
2023-07-05 14:19:02+0300: No existing tuning parameters found or parseable or valid at: /home/lemon/.katago/opencltuning/tune11_gpuIntelRIrisRXeGraphics0x9a49_x19_y19_c384_mv14.txt
2023-07-05 14:19:02+0300: Performing autotuning
2023-07-05 14:19:02+0300: *** On some systems, this may take several minutes, please be patient ***
2023-07-05 14:19:02+0300: Found OpenCL Platform 0: Intel(R) OpenCL HD Graphics (Intel(R) Corporation) (OpenCL 3.0 )
2023-07-05 14:19:02+0300: Found 1 device(s) on platform 0 with type CPU or GPU or Accelerator
2023-07-05 14:19:02+0300: Found OpenCL Device 0: Intel(R) Iris(R) Xe Graphics [0x9a49] (Intel(R) Corporation) (score 6000300)
2023-07-05 14:19:02+0300: Creating context for OpenCL Platform: Intel(R) OpenCL HD Graphics (Intel(R) Corporation) (OpenCL 3.0 )
2023-07-05 14:19:02+0300: Using OpenCL Device 0: Intel(R) Iris(R) Xe Graphics [0x9a49] (Intel(R) Corporation) OpenCL 3.0 NEO (Extensions: cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_command_queue_families cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_il_program cl_intel_mem_force_host_memory cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_subgroup_clustered_reduce cl_intel_device_attribute_query cl_khr_suggested_local_work_size cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_intel_planar_yuv cl_intel_packed_yuv cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_3d_image_writes cl_intel_media_block_io cl_intel_va_api_media_sharing cl_intel_sharing_format_query cl_khr_pci_bus_info cl_intel_subgroup_local_block_io )
Beginning GPU tuning for Intel(R) Iris(R) Xe Graphics [0x9a49] modelVersion 14 channels 384
2023-07-05 14:19:02+0300: Dummy tuning thread starting
2023-07-05 14:19:02+0300: Creating context for OpenCL Platform: Intel(R) OpenCL HD Graphics (Intel(R) Corporation) (OpenCL 3.0 )
2023-07-05 14:19:02+0300: Using OpenCL Device 0: Intel(R) Iris(R) Xe Graphics [0x9a49] (Intel(R) Corporation) OpenCL 3.0 NEO (Extensions: cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_command_queue_families cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_il_program cl_intel_mem_force_host_memory cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_subgroup_clustered_reduce cl_intel_device_attribute_query cl_khr_suggested_local_work_size cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_intel_planar_yuv cl_intel_packed_yuv cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_3d_image_writes cl_intel_media_block_io cl_intel_va_api_media_sharing cl_intel_sharing_format_query cl_khr_pci_bus_info cl_intel_subgroup_local_block_io )
2023-07-05 14:19:02+0300: WARNING: Dummy thread to load the GPU while tuning failed
2023-07-05 14:19:02+0300: Compile error: CL_OUT_OF_HOST_MEMORY
BUILD LOG FOR xgemmDirectProgram ON DEVICE 0
Setting winograd3x3TileSize = 4
------------------------------------------------------
Tuning xGemmDirect for 1x1 convolutions and matrix mult
Testing 55 different configs
WARNING: Reference implementation failed: CL_BUILD_PROGRAM_FAILURE
Tuning 20/55 ...
Tuning 40/55 ...
ERROR: Could not find any configuration that worked
------------------------------------------------------
Tuning xGemm for convolutions
Testing 69 different configs
WARNING: Reference implementation failed: CL_BUILD_PROGRAM_FAILURE
Tuning 20/69 ...
Tuning 40/69 ...
Tuning 60/69 ...
ERROR: Could not find any configuration that worked
------------------------------------------------------
Tuning hGemmWmma for convolutions
Testing 144 different configs
FP16 tensor core tuning failed, assuming no FP16 tensor core support
------------------------------------------------------
Tuning hGemmWmmaNCHW for 1x1 convolutions
Testing 108 different configs
FP16 tensor core tuning failed for 1x1 convs
------------------------------------------------------
Tuning xGemm16 for convolutions
Testing 69 different configs
FP16 compute tuning failed, assuming no FP16 compute support
------------------------------------------------------
Tuning xGemm for convolutions - trying with FP16 storage
Testing 69 different configs
FP16 storage tuning failed, assuming no FP16 storage support
------------------------------------------------------
Using FP32 storage!
Using FP32 compute!
------------------------------------------------------
Tuning winograd transform for convolutions
Testing 45 different configs
WARNING: Reference implementation failed: CL_BUILD_PROGRAM_FAILURE
Tuning 20/45 ...
Tuning 40/45 ...
ERROR: Could not find any configuration that worked
------------------------------------------------------
Tuning winograd untransform for convolutions
Testing 109 different configs
WARNING: Reference implementation failed: CL_BUILD_PROGRAM_FAILURE
Tuning 20/109 ...
Tuning 40/109 ...
Tuning 60/109 ...
Tuning 80/109 ...
Tuning 100/109 ...
ERROR: Could not find any configuration that worked
------------------------------------------------------
Tuning global pooling strides
Testing 104 different configs
WARNING: Reference implementation failed: CL_BUILD_PROGRAM_FAILURE
Tuning 20/104 ...
Tuning 40/104 ...
Tuning 60/104 ...
Tuning 80/104 ...
Tuning 100/104 ...
ERROR: Could not find any configuration that worked
Done tuning
------------------------------------------------------
2023-07-05 14:19:02+0300: Done tuning, saved results to /home/lemon/.katago/opencltuning/tune11_gpuIntelRIrisRXeGraphics0x9a49_x19_y19_c384_mv14.txt
terminate called after throwing an instance of 'OpenCLHelpers::CompileError'
what(): CL_OUT_OF_HOST_MEMORY
BUILD LOG FOR conv2dNCHWProgram ON DEVICE 0
Aborted (core dumped)
For reference, I get similar CL_OUT_OF_HOST_MEMORY
errors on the binaries for both versions of KataGo, and on trying to run an updated version of Katrain.
Trying to see if the new libzip.so.5 affects cmake
for Katago v.1.13.2, I get the following error on another instance of v.1.13.2:
-- The C compiler identification is GNU 11.3.0
-- The CXX compiler identification is GNU 11.3.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Building 'katago' executable for GTP engine and other tools.
-- -DUSE_BACKEND=OPENCL, using OpenCL backend.
-- Including Git revision in the compiled executable
-- Found Git: /usr/bin/git (found version "2.34.1")
-- Looking for CL_VERSION_2_2
-- Looking for CL_VERSION_2_2 - not found
-- Looking for CL_VERSION_2_1
-- Looking for CL_VERSION_2_1 - not found
-- Looking for CL_VERSION_2_0
-- Looking for CL_VERSION_2_0 - not found
-- Looking for CL_VERSION_1_2
-- Looking for CL_VERSION_1_2 - not found
-- Looking for CL_VERSION_1_1
-- Looking for CL_VERSION_1_1 - not found
-- Looking for CL_VERSION_1_0
-- Looking for CL_VERSION_1_0 - not found
-- Could NOT find OpenCL (missing: OpenCL_LIBRARY OpenCL_INCLUDE_DIR)
CMake Warning at CMakeLists.txt:312 (message):
OpenCL not found, attempting to see if CUDA exists and has OpenCL since
sometimes CUDA may provide OpenCL where cmake can't find it.
-- Could not find nvcc, please set CUDAToolkit_ROOT.
CMake Error at CMakeLists.txt:315 (message):
OpenCL installation not found
-- Configuring incomplete, errors occurred!
See also "/home/lemon/Library/games/Go/Engines/katago_v.1.13.2_NEW/cpp/CMakeFiles/CMakeOutput.log".
See also "/home/lemon/Library/games/Go/Engines/katago_v.1.13.2_NEW/cpp/CMakeFiles/CMakeError.log".
Sorry for not being sure about different stuff I might do incorrectly.
I'll try rebuilding on my arch VM [I forgot that I can't allocate GPU to VBox so it doesn't work] dual-booting arch and building KataGo there to see if the problem persists.
Given that all these things aren't working, what was the way that you did get it working before?
Yeah, intel GPUs can be tricky sometimes, and sometimes have issues with OpenCL. I wish there were better ways of making it work. How fast is the pure CPU version for you? (eigen).
I have the same CPU and when compiling KataGo go the same error message: "Could NOT find OpenCL (missing: OpenCL_LIBRARY OpenCL_INCLUDE_DIR)".
After installing the package ocl-icd-opencl-dev, KataGo compiled fine.
This is not a KataGo issue, and I think that it can be closed.
PS. Tried the eigen version too, it was 10x slower...
Hello.
I've built KataGo as instructed and got the following result when running
./katago benchmark
. My GPU is Intel Iris Xe Graphics.Edit: Forgot to mention possibly relevant info: I'm on Ubuntu 22.04.2 x86_64 and my CPU is 11th Gen Intel i5-1135G7.
I managed to make KataGo work on the same computer (and GPU) previously. I'm not entirely sure if the error is due to the current OpenCl provider from Intel (that I have from https://github.com/intel/compute-runtime/releases), a mistake on my part or something else.
AS
Possibly relevant OpenCl platform info from
clinfo
: