Open w3333 opened 1 year ago
Yeah, you might be running into a problem with drivers. I see the word "Mesa" appear in your info - that "Mesa" drivers have been found to be buggy for general purpose OpenCL usage by users in the past. Does this thread help? https://bbs.archlinux.org/viewtopic.php?pid=1895516#p1895516
Thanks. Seems like it. The only reason I tried the mesa-opencl-icd lib is cause an admin (on the mint forum) suggested that. Without it, my clinfo output doesn't even see the GPU as opencl device... but anyway, that's not your problem. Just thought I ask here, maybe the kata output would be a clue. Frankly I'm out of ideas. It used to work pretty well. Not sure what caused the problems, maybe a switch to a newer GPU (but not really, old GPU has same problems), maybe switching to Mint 21 (based on Ubuntu 22.04).... just can't get it to work anymore and even skilled people like the admins and devs on the Mint forum can't help... hm.
Only thing you might know: do you yourself use or know of people using AMD GPUs like the RX 6700 XT, or something from that generation, successfully with Linux/Ubuntu/Mint and Katago? I mean, this has to work somehow somewhere...
use AMD-Rocm instead of mesa which is currently known to be broken
Hello, sorry to bother you. Not sure if that is even Katago's problem or something else. Have been using kata under Linux for years using OpenCL with an AMD GPU (RX570). Recently I switched to a RX6770XT, and after some troubles managed to install the AMD drivers for it for OpenCL support. However, some strange things happen that didn't happen before.
Here's the output Kata gives when I try to tune it:
`~/katago$ ./katago tuner -model kata1-b40.bin.gz 2023-06-05 23:40:45+0200: Loading model... 2023-06-05 23:40:46+0200: Querying system devices... 2023-06-05 23:40:46+0200: Found OpenCL Platform 0: Clover (Mesa) (OpenCL 1.1 Mesa 22.2.5) 2023-06-05 23:40:46+0200: Found 1 device(s) on platform 0 with type CPU or GPU or Accelerator 2023-06-05 23:40:46+0200: Found OpenCL Platform 1: AMD Accelerated Parallel Processing (Advanced Micro Devices, Inc.) (OpenCL 2.1 AMD-APP (3513.0)) 2023-06-05 23:40:46+0200: Found 0 device(s) on platform 1 with type CPU or GPU or Accelerator, skipping 2023-06-05 23:40:46+0200: Found OpenCL Device 0: AMD Radeon RX 6700 XT (navi22, LLVM 15.0.6, DRM 3.48, 5.19.0-43-generic) (AMD) (score 11000101) 2023-06-05 23:40:46+0200: Tuner starting... 2023-06-05 23:40:46+0200: Creating context for OpenCL Platform: Clover (Mesa) (OpenCL 1.1 Mesa 22.2.5) 2023-06-05 23:40:46+0200: Using OpenCL Device 0: AMD Radeon RX 6700 XT (navi22, LLVM 15.0.6, DRM 3.48, 5.19.0-43-generic) (AMD) OpenCL 1.1 Mesa 22.2.5 (Extensions: cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp64 cl_khr_extended_versioning)
Tuning device 0: AMD Radeon RX 6700 XT (navi22, LLVM 15.0.6, DRM 3.48, 5.19.0-43-generic) Starting from existing parameters in: /home/werner/.katago/opencltuning/tune8_gpuAMDRadeonRX6700XTnavi22LLVM1506DRM348519043generic_x19_y19_c256_mv10.txt Beginning GPU tuning for AMD Radeon RX 6700 XT (navi22, LLVM 15.0.6, DRM 3.48, 5.19.0-43-generic) modelVersion 10 channels 256 Setting winograd3x3TileSize = 4
Tuning xGemmDirect for 1x1 convolutions and matrix mult Testing 56 different configs WARNING: Reference implementation failed: CL_BUILD_PROGRAM_FAILURE Tuning 20/56 ... Tuning 40/56 ... ERROR: Could not find any configuration that worked
Tuning xGemm for convolutions Testing 70 different configs WARNING: Reference implementation failed: CL_BUILD_PROGRAM_FAILURE Tuning 20/70 ... Tuning 40/70 ... Tuning 60/70 ... ERROR: Could not find any configuration that worked
Tuning hGemmWmma for convolutions Testing 146 different configs FP16 tensor core tuning failed, assuming no FP16 tensor core support
Tuning xGemm for convolutions - trying with FP16 storage Testing 70 different configs FP16 storage tuning failed, assuming no FP16 storage support
Using FP32 storage! Using FP32 compute!
Tuning winograd transform for convolutions Testing 47 different configs WARNING: Reference implementation failed: CL_BUILD_PROGRAM_FAILURE Tuning 20/47 ... Tuning 40/47 ... ERROR: Could not find any configuration that worked
Tuning winograd untransform for convolutions Testing 111 different configs WARNING: Reference implementation failed: CL_BUILD_PROGRAM_FAILURE Tuning 20/111 ... Tuning 40/111 ... Tuning 60/111 ... Tuning 80/111 ... Tuning 100/111 ... ERROR: Could not find any configuration that worked
Tuning global pooling strides Testing 106 different configs WARNING: Reference implementation failed: CL_BUILD_PROGRAM_FAILURE Tuning 20/106 ... Tuning 40/106 ... Tuning 60/106 ... Tuning 80/106 ... Tuning 100/106 ... ERROR: Could not find any configuration that worked Done tuning
Done, results saved to /home/werner/.katago/opencltuning/tune8_gpuAMDRadeonRX6700XTnavi22LLVM1506DRM348519043generic_x19_y19_c256_mv10.txt `
Never seen that error before. But I suspect it has something to do with a line that clinfo is giving me (right under "CL_PROGRAM_BUILD_LOG"):
`$ clinfo Number of platforms 2 Platform Name Clover Platform Vendor Mesa Platform Version OpenCL 1.1 Mesa 22.2.5 Platform Profile FULL_PROFILE Platform Extensions cl_khr_icd Platform Extensions function suffix MESA
Platform Name AMD Accelerated Parallel Processing Platform Vendor Advanced Micro Devices, Inc. Platform Version OpenCL 2.1 AMD-APP (3513.0) Platform Profile FULL_PROFILE Platform Extensions cl_khr_icd cl_amd_event_callback Platform Extensions function suffix AMD Platform Host timer resolution 1ns
Platform Name Clover Number of devices 1 Device Name AMD Radeon RX 6700 XT (navi22, LLVM 15.0.6, DRM 3.48, 5.19.0-43-generic) Device Vendor AMD Device Vendor ID 0x1002 Device Version OpenCL 1.1 Mesa 22.2.5 Device Numeric Version 0x401000 (1.1.0) Driver Version 22.2.5 Device OpenCL C Version OpenCL C 1.1 Device Type GPU Device Profile FULL_PROFILE Device Available Yes Compiler Available Yes Max compute units 40 Max clock frequency 2725MHz Max work item dimensions 3 Max work item sizes 256x256x256 Max work group size 256 === CL_PROGRAM_BUILD_LOG === fatal error: cannot open file '/usr/lib/clc/gfx1031-amdgcn-mesa-mesa3d.bc': No such file or directory Preferred work group size multiple (kernel) <getWGsizes:1504: create kernel : error -46> Preferred / native vector sizes
char 16 / 16
short 8 / 8
int 4 / 4
long 2 / 2
half 0 / 0 (n/a) float 4 / 4
double 2 / 2 (cl_khr_fp64) Half-precision Floating-point support (n/a) Single-precision Floating-point support (core) Denormals No Infinity and NANs Yes Round to nearest Yes Round to zero No Round to infinity No IEEE754-2008 fused multiply-add No Support is emulated in software No Correctly-rounded divide and sqrt operations No Double-precision Floating-point support (cl_khr_fp64) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Address bits 64, Little-Endian Global memory size 12884901888 (12GiB) Error Correction support No Max memory allocation 3221225472 (3GiB) Unified memory for Host and Device No Minimum alignment for any data type 128 bytes Alignment of base address 32768 bits (4096 bytes) Global Memory cache type None Image support No Local memory type Local Local memory size 65536 (64KiB) Max number of constant args 16 Max constant buffer size 67108864 (64MiB) Max size of kernel argument 1024 Queue properties
Out-of-order execution No Profiling Yes Profiling timer resolution 0ns Execution capabilities
Run OpenCL kernels Yes Run native kernels No ILs with version (n/a) Built-in kernels with version (n/a) Device Extensions cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp64 cl_khr_extended_versioning Device Extensions with Version cl_khr_byte_addressable_store 0x400000 (1.0.0) cl_khr_global_int32_base_atomics 0x400000 (1.0.0) cl_khr_global_int32_extended_atomics 0x400000 (1.0.0) cl_khr_local_int32_base_atomics 0x400000 (1.0.0) cl_khr_local_int32_extended_atomics 0x400000 (1.0.0) cl_khr_int64_base_atomics 0x400000 (1.0.0) cl_khr_int64_extended_atomics 0x400000 (1.0.0) cl_khr_fp64 0x400000 (1.0.0) cl_khr_extended_versioning 0x400000 (1.0.0)
Platform Name AMD Accelerated Parallel Processing Number of devices 0
NULL platform behavior clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) No platform clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) No platform clCreateContext(NULL, ...) [default] No platform clCreateContext(NULL, ...) [other] Success [MESA] clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) Success (1) Platform Name Clover Device Name AMD Radeon RX 6700 XT (navi22, LLVM 15.0.6, DRM 3.48, 5.19.0-43-generic) clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1) Platform Name Clover Device Name AMD Radeon RX 6700 XT (navi22, LLVM 15.0.6, DRM 3.48, 5.19.0-43-generic) clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1) Platform Name Clover Device Name AMD Radeon RX 6700 XT (navi22, LLVM 15.0.6, DRM 3.48, 5.19.0-43-generic)`
Maybe you have a clue.