lightvector / KataGo

GTP engine and self-play learning in Go
https://katagotraining.org/
Other
3.56k stars 564 forks source link

KataGo Abort 6: libc++abi.dylib: terminating with uncaught exception of type OpenCLHelpers::CompileError: CL_BUILD_PROGRAM_FAILURE #482

Open tigerking opened 3 years ago

tigerking commented 3 years ago

system: Mac OS 10.15.7 brew install KataGo 1.8.2

start KataGo -cfg: gtp_example.cfg with update

-model: kata1-b20c256x2-s4384473088-d968438914.bin.gz

<1> run KataGo as GTP mode katago gtp -cfg -model init params: boardsize 19 time_settings 0 20 1 komi 7.5 clear_board genmove B ... quit repeat <1> several times, sometimes it works charm. But sometimes , it fails after couple of runs. ERROR logs: >>>> KataGo v1.8.2 Using TrompTaylor rules initially, unless GTP/GUI overrides this Creating context for OpenCL Platform: Apple (Apple) (OpenCL 1.2 (Mar 5 2021 00:20:05)) Using OpenCL Device 1: Intel(R) Iris(TM) Plus Graphics 655 (Intel Inc.) OpenCL 1.2 (Extensions: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_image2d_from_buffer cl_khr_gl_depth_images cl_khr_depth_images cl_khr_3d_image_writes ) Loaded tuning parameters from: /Users//.katago/opencltuning/tune8_gpuIntelRIrisTMPlusGraphics655_x19_y19_c256_mv8.txt libc++abi.dylib: terminating with uncaught exception of type OpenCLHelpers::CompileError: CL_BUILD_PROGRAM_FAILURE BUILD LOG FOR conv2dNCHWProgram ON DEVICE 0 Compile Server Error. katago... 875 Abort trap: 6 /usr/local/bin/katago gtp -config /Users//katago/cfg/gtp_example.cfg -model /Users//katago/model/kata1-b20c256x2-s4384473088-d968438914.bin.gz Additionally, once it fails , it will always fail (Abort trap: 6 ). Only can recover by rebooting system, but fails soon again after several times run/re-run
lightvector commented 3 years ago

I'm guessing your OpenCL implementation appears to be unstable or unreliable, based on the error message. This might be a problem with your GPU drivers. Try updating your drivers, and if that works, great, but if not, then there's not much KataGo can do about this, you will have to use the Eigen version (CPU-only KataGo).

tigerking commented 3 years ago

I'm guessing your OpenCL implementation appears to be unstable or unreliable, based on the error message. This might be a problem with your GPU drivers. Try updating your drivers, and if that works, great, but if not, then there's not much KataGo can do about this, you will have to use the Eigen version (CPU-only KataGo).

Thanks for response. The GPU (OpenCL ) driver is system internally integrated(Mac OS 10.15.7). I suspect that some resources not being released in case failures (eg, memory, handlers, locks, etc), guess only, so the later runs always fails

tigerking commented 3 years ago

Mark: some clues. It looks like that the issue only happens after MAC screensaver is activated. Not 100% sure! I try reproduce the issue twice by activating screensaver manually. But issue not happens if I keep my MAC running, not let screensaver runs.