katago 1.13.0 opencl version error.

mc-mong commented 1 year ago

Loaded model ../katago_weights/b18c384nbt-optimisticv13-s5971M.bin.gz Model name: kata1-b18c384nbt-softplusfixv13-s5971481344-d3261785976 GTP ready, beginning main protocol loop katago 18 block> komi 6.5 = katago 18 block> boardsize 19 = katago 18 block> clear_board = ○ katago 18 block> play B D3 = ● katago 18 block> kata-genmove_analyze W 50 Connection Failed

After the above message, the game cannot proceed.

os : windows 10 64bit gpu : 3070ti katago 1.13.0 opencl new version.

lightvector commented 1 year ago

"Connection Failed" is an interesting error message, it's not one I've seen before. Is it produced by KataGo directly, or is there a GUI or other graphical game/SGF editor or that you are using? Does the same error occur on older versions?

mc-mong commented 1 year ago

Overwritten and copied to an existing file. It works when I go back to the new folder. Overwritten copy folders are still not allowed. Delete the existing file and do it in a clean folder.

Maybe it's because of the previously used config file. I made a config file with genconfig and used it.

test gui : sabaki 0.52.2 version. gogui 1.5.1 version.

lightvector commented 1 year ago

What happens if you run it on the command line?

./katago.exe benchmark -config path/to/your/config.cfg -model path/to/your/model.bin.gz

What error do you get?

mc-mong commented 1 year ago

In the existing folder you copied, it appears as follows.

D:\baduk\Katago-opencl>katago benchmark -config b18_config.cfg -model ../katago_weights/b18c384nbt-optimisticv13-s5971M.bin.gz 2023-05-25 05:21:55+0900: Running with following config: allowResignation = true friendlyPassOk = false hasButton = false koRule = SIMPLE lagBuffer = 1.0 logAllGTPCommunication = true logDir = gtp_logs logSearchInfo = true logToStderr = false maxPlayouts = 30 multiStoneSuicideLegal = false nnCacheSizePowerOfTwo = 23 nnMutexPoolSizePowerOfTwo = 19 numNNServerThreadsPerModel = 1 numSearchThreads = 16 openclDeviceToUseThread0 = 0 ponderingEnabled = false resignConsecTurns = 3 resignThreshold = -0.99 scoringRule = TERRITORY searchFactorAfterOnePass = 0.50 searchFactorAfterTwoPass = 0.25 searchFactorWhenWinning = 0.40 searchFactorWhenWinningThreshold = 0.95 taxRule = SEKI whiteHandicapBonus = 0

2023-05-25 05:21:55+0900: Loading model and initializing benchmark... 2023-05-25 05:21:55+0900: Testing with default positions for board size: 19 2023-05-25 05:21:55+0900: nnRandSeed0 = 8576874730517683006 2023-05-25 05:21:55+0900: After dedups: nnModelFile0 = ../katago_weights/b18c384nbt-optimisticv13-s5971M.bin.gz useFP16 auto useNHWC auto 2023-05-25 05:21:55+0900: Initializing neural net buffer to be size 19 * 19 exactly 2023-05-25 05:21:55+0900: Found OpenCL Platform 0: NVIDIA CUDA (NVIDIA Corporation) (OpenCL 3.0 CUDA 11.8.88) 2023-05-25 05:21:55+0900: Found 1 device(s) on platform 0 with type CPU or GPU or Accelerator 2023-05-25 05:21:55+0900: Found OpenCL Device 0: NVIDIA GeForce RTX 3070 Ti (NVIDIA Corporation) (score 11000300) 2023-05-25 05:21:55+0900: Creating context for OpenCL Platform: NVIDIA CUDA (NVIDIA Corporation) (OpenCL 3.0 CUDA 11.8.88) 2023-05-25 05:21:55+0900: Using OpenCL Device 0: NVIDIA GeForce RTX 3070 Ti (NVIDIA Corporation) OpenCL 3.0 CUDA (Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid cl_khr_pci_bus_info cl_khr_external_semaphore cl_khr_external_memory cl_khr_external_semaphore_win32 cl_khr_external_memory_win32) 2023-05-25 05:21:55+0900: Loaded tuning parameters from: D:\baduk\Katago-opencl/KataGoData/opencltuning/tune11_gpuNVIDIAGeForceRTX3070Ti_x19_y19_c384_mv13.txt 2023-05-25 05:21:55+0900: OpenCL backend thread 0: Device 0 Model version 13 2023-05-25 05:21:55+0900: OpenCL backend thread 0: Device 0 Model name: kata1-b18c384nbt-softplusfixv13-s5971481344-d3261785976 2023-05-25 05:21:56+0900: OpenCL backend thread 0: Device 0 FP16Storage true FP16Compute false FP16TensorCores true FP16TensorCoresFor1x1 false

D:\baduk\Katago-opencl>

** In a clean folder, it appears as follows.

D:\baduk\katago-opencl-ex>katago benchmark -config b18_config.cfg -model ../katago_weights/b18c384nbt-optimisticv13-s5971M.bin.gz 2023-05-25 05:28:31+0900: Running with following config: allowResignation = true friendlyPassOk = false hasButton = false koRule = SIMPLE lagBuffer = 1.0 logAllGTPCommunication = true logDir = gtp_logs logSearchInfo = true logToStderr = false maxPlayouts = 30 multiStoneSuicideLegal = false nnCacheSizePowerOfTwo = 23 nnMutexPoolSizePowerOfTwo = 19 numNNServerThreadsPerModel = 1 numSearchThreads = 16 openclDeviceToUseThread0 = 0 ponderingEnabled = false resignConsecTurns = 3 resignThreshold = -0.99 scoringRule = TERRITORY searchFactorAfterOnePass = 0.50 searchFactorAfterTwoPass = 0.25 searchFactorWhenWinning = 0.40 searchFactorWhenWinningThreshold = 0.95 taxRule = SEKI whiteHandicapBonus = 0

2023-05-25 05:28:31+0900: Loading model and initializing benchmark... 2023-05-25 05:28:31+0900: Testing with default positions for board size: 19 2023-05-25 05:28:31+0900: nnRandSeed0 = 16784257899505801039 2023-05-25 05:28:31+0900: After dedups: nnModelFile0 = ../katago_weights/b18c384nbt-optimisticv13-s5971M.bin.gz useFP16 auto useNHWC auto 2023-05-25 05:28:31+0900: Initializing neural net buffer to be size 19 * 19 exactly 2023-05-25 05:28:31+0900: Found OpenCL Platform 0: NVIDIA CUDA (NVIDIA Corporation) (OpenCL 3.0 CUDA 11.8.88) 2023-05-25 05:28:31+0900: Found 1 device(s) on platform 0 with type CPU or GPU or Accelerator 2023-05-25 05:28:31+0900: Found OpenCL Device 0: NVIDIA GeForce RTX 3070 Ti (NVIDIA Corporation) (score 11000300) 2023-05-25 05:28:31+0900: Creating context for OpenCL Platform: NVIDIA CUDA (NVIDIA Corporation) (OpenCL 3.0 CUDA 11.8.88) 2023-05-25 05:28:32+0900: Using OpenCL Device 0: NVIDIA GeForce RTX 3070 Ti (NVIDIA Corporation) OpenCL 3.0 CUDA (Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid cl_khr_pci_bus_info cl_khr_external_semaphore cl_khr_external_memory cl_khr_external_semaphore_win32 cl_khr_external_memory_win32) 2023-05-25 05:28:32+0900: Loaded tuning parameters from: D:\baduk\katago-opencl-ex/KataGoData/opencltuning/tune11_gpuNVIDIAGeForceRTX3070Ti_x19_y19_c384_mv13.txt 2023-05-25 05:28:32+0900: OpenCL backend thread 0: Device 0 Model version 13 2023-05-25 05:28:32+0900: OpenCL backend thread 0: Device 0 Model name: kata1-b18c384nbt-softplusfixv13-s5971481344-d3261785976 2023-05-25 05:28:32+0900: OpenCL backend thread 0: Device 0 FP16Storage true FP16Compute false FP16TensorCores true FP16TensorCoresFor1x1 false

2023-05-25 05:28:33+0900: Loaded config b18_config.cfg 2023-05-25 05:28:33+0900: Loaded model ../katago_weights/b18c384nbt-optimisticv13-s5971M.bin.gz

Testing using 800 visits. If you have a good GPU, you might increase this using "-visits N" to get more accurate results. If you have a weak GPU and this is taking forever, you can decrease it instead to finish the benchmark faster.

You are currently using the OpenCL version of KataGo. If you have a strong GPU capable of FP16 tensor cores (e.g. RTX2080), using the Cuda version of KataGo instead may give a mild performance boost.

Your GTP config is currently set to use numSearchThreads = 16 Automatically trying different numbers of threads to home in on the best (board size 19x19):

2023-05-25 05:28:33+0900: GPU 0 finishing, processed 5 rows 5 batches 2023-05-25 05:28:33+0900: nnRandSeed0 = 12646512682837780868 2023-05-25 05:28:33+0900: After dedups: nnModelFile0 = ../katago_weights/b18c384nbt-optimisticv13-s5971M.bin.gz useFP16 auto useNHWC auto 2023-05-25 05:28:33+0900: Initializing neural net buffer to be size 19 * 19 exactly 2023-05-25 05:28:33+0900: Found OpenCL Platform 0: NVIDIA CUDA (NVIDIA Corporation) (OpenCL 3.0 CUDA 11.8.88) 2023-05-25 05:28:33+0900: Found 1 device(s) on platform 0 with type CPU or GPU or Accelerator 2023-05-25 05:28:33+0900: Found OpenCL Device 0: NVIDIA GeForce RTX 3070 Ti (NVIDIA Corporation) (score 11000300) 2023-05-25 05:28:33+0900: Creating context for OpenCL Platform: NVIDIA CUDA (NVIDIA Corporation) (OpenCL 3.0 CUDA 11.8.88) 2023-05-25 05:28:34+0900: Using OpenCL Device 0: NVIDIA GeForce RTX 3070 Ti (NVIDIA Corporation) OpenCL 3.0 CUDA (Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid cl_khr_pci_bus_info cl_khr_external_semaphore cl_khr_external_memory cl_khr_external_semaphore_win32 cl_khr_external_memory_win32) 2023-05-25 05:28:34+0900: Loaded tuning parameters from: D:\baduk\katago-opencl-ex/KataGoData/opencltuning/tune11_gpuNVIDIAGeForceRTX3070Ti_x19_y19_c384_mv13.txt 2023-05-25 05:28:34+0900: OpenCL backend thread 0: Device 0 Model version 13 2023-05-25 05:28:34+0900: OpenCL backend thread 0: Device 0 Model name: kata1-b18c384nbt-softplusfixv13-s5971481344-d3261785976 2023-05-25 05:28:34+0900: OpenCL backend thread 0: Device 0 FP16Storage true FP16Compute false FP16TensorCores true FP16TensorCoresFor1x1 false

Possible numbers of threads to test: 1, 2, 3, 4, 5, 6, 8, 10, 12, 16, 20, 24, 32,

numSearchThreads = 5: 10 / 10 positions, visits/s = 514.89 nnEvals/s = 435.65 nnBatches/s = 175.07 avgBatchSize = 2.49 (15.6 secs) numSearchThreads = 12: 10 / 10 positions, visits/s = 755.29 nnEvals/s = 619.28 nnBatches/s = 104.42 avgBatchSize = 5.93 (10.7 secs) numSearchThreads = 10: 10 / 10 positions, visits/s = 708.05 nnEvals/s = 592.31 nnBatches/s = 119.93 avgBatchSize = 4.94 (11.4 secs) numSearchThreads = 20: 10 / 10 positions, visits/s = 841.51 nnEvals/s = 714.21 nnBatches/s = 73.17 avgBatchSize = 9.76 (9.7 secs) numSearchThreads = 16: 10 / 10 positions, visits/s = 802.27 nnEvals/s = 666.96 nnBatches/s = 85.31 avgBatchSize = 7.82 (10.1 secs) numSearchThreads = 24: 10 / 10 positions, visits/s = 849.15 nnEvals/s = 730.77 nnBatches/s = 62.66 avgBatchSize = 11.66 (9.7 secs) numSearchThreads = 32: 10 / 10 positions, visits/s = 839.41 nnEvals/s = 750.30 nnBatches/s = 48.50 avgBatchSize = 15.47 (9.9 secs)

Ordered summary of results:

numSearchThreads = 5: 10 / 10 positions, visits/s = 514.89 nnEvals/s = 435.65 nnBatches/s = 175.07 avgBatchSize = 2.49 (15.6 secs) (EloDiff baseline) numSearchThreads = 10: 10 / 10 positions, visits/s = 708.05 nnEvals/s = 592.31 nnBatches/s = 119.93 avgBatchSize = 4.94 (11.4 secs) (EloDiff +103) numSearchThreads = 12: 10 / 10 positions, visits/s = 755.29 nnEvals/s = 619.28 nnBatches/s = 104.42 avgBatchSize = 5.93 (10.7 secs) (EloDiff +122) numSearchThreads = 16: 10 / 10 positions, visits/s = 802.27 nnEvals/s = 666.96 nnBatches/s = 85.31 avgBatchSize = 7.82 (10.1 secs) (EloDiff +135) numSearchThreads = 20: 10 / 10 positions, visits/s = 841.51 nnEvals/s = 714.21 nnBatches/s = 73.17 avgBatchSize = 9.76 (9.7 secs) (EloDiff +143) numSearchThreads = 24: 10 / 10 positions, visits/s = 849.15 nnEvals/s = 730.77 nnBatches/s = 62.66 avgBatchSize = 11.66 (9.7 secs) (EloDiff +136) numSearchThreads = 32: 10 / 10 positions, visits/s = 839.41 nnEvals/s = 750.30 nnBatches/s = 48.50 avgBatchSize = 15.47 (9.9 secs) (EloDiff +110)

Based on some test data, each speed doubling gains perhaps ~250 Elo by searching deeper. Based on some test data, each thread costs perhaps 7 Elo if using 800 visits, and 2 Elo if using 5000 visits (by making MCTS worse). So APPROXIMATELY based on this benchmark, if you intend to do a 5 second search: numSearchThreads = 5: (baseline) numSearchThreads = 10: +103 Elo numSearchThreads = 12: +122 Elo numSearchThreads = 16: +135 Elo numSearchThreads = 20: +143 Elo (recommended) numSearchThreads = 24: +136 Elo numSearchThreads = 32: +110 Elo

If you care about performance, you may want to edit numSearchThreads in b18_config.cfg based on the above results! If you intend to do much longer searches, configure the seconds per game move you expect with the '-time' flag and benchmark again. If you intend to do short or fixed-visit searches, use lower numSearchThreads for better strength, high threads will weaken strength. If interested see also other notes about performance and mem usage in the top of b18_config.cfg

2023-05-25 05:30:01+0900: GPU 0 finishing, processed 48401 rows 7890 batches

D:\baduk\katago-opencl-ex>

It comes out well in a clean folder.

mc-mong commented 1 year ago

The b18_config.cfg file is the same.

lightvector commented 1 year ago

Thanks. So I am definitely not sure why there's an issue with your existing folder, but if it comes out well in a clean folder then it sounds like you have a workaround to the issue that solves the problem?

But if you want to investigate further, maybe try seeing if the difference is in the DLL files. Maybe there is a version difference between the DLL files between the two folders, or some other data in those folders?

mc-mong commented 1 year ago

The lizzie file exists in an existing file. There are other files. There was no problem copying and using the katago 1.12.4 version to the existing file. Maybe there's a conflict in the 1.13.* version.

mc-mong commented 1 year ago

Thank you.

lightvector / KataGo

katago 1.13.0 opencl version error. #792