Open fpemud opened 4 years ago
Only OpenCL and CUDA have implementations right now. If you'd interested in doing the work to write another backend, that would be cool. Be warned that normally this is quite a lot of work - GPU code is tricky.
Basically you have to implement this interface: https://github.com/lightvector/KataGo/blob/master/cpp/neuralnet/nninterface.h
You can look at cudabackend.cpp and openclbackend.cpp for the existing implementations.
Something like ONNXRuntime backend might be worth implementing, provided that its speed is reasonable.
Oct 2023 : I have an AMD rx6800 XT, with debian + rocm5.7.0 and i can compile and run Katago without problems, just by following the compile guide.
I removed mesa-opencl which does not work at all (1) I use xorg + amdgpu proprietary driver, and rocm donwloaded from their site.
i have only one GPU which is used simultaneously for my display, for openCL-rocm Katago bot in backgrouund, and i even tried an old 3D game under wine simultanesouly. I saw no problem, no messages in kernel logs...
(1) WIP mesa 23.3 https://www.phoronix.com/news/Mesa-23.3-Rusticl-On-Zink.
Could u please share ur compile guide? Thank you very much!
Before going further, rocm5.2 is packaged in debian i don't remeber why i do not have just this from debian, and took from amd site.
Maybe just
apt search amdgpu
and install the corresponding driver, and firmware
apt install rocm ocl-icd-libopencl ocl-icd-dev ocl-icd-opencl-dev
In case you have mostly nothing installed for compiling,
apt-get build-dep leela-zero
should install most of i what is needed to compile katago
To install Rocm + graphic driver from AMD site, the following worked for me for RX6800XT on debian 12, i have no idea if it work for 5700.
On debian use the amdgpu
xserver-xorg-video-amdgpu
(the nonfree firmware will be downloaded from AMD site later, but it may be wise to ensure you have a correct display before going further, and install the firmware from debian first, to start from a clean working state : https://unix.stackexchange.com/questions/736065/how-do-i-install-non-free-firmware-in-debian-12-bookworm )
You can remove other openCL-icd packages from system
apt remove mesa-opencl-icd # does not work for me (and others)
apt remove pocl-opencl-icd # may be use to do OpenCL on cpu, it is seen a an OpenCL device.. no need
AMD site : https://rocmdocs.amd.com/en/latest/deploy/linux/quick_start.html
There is an amdgpu-installer but it is for ubuntu, and points towards ubuntu repo, so i used the other method and followed the 4 easy steps.
Download and convert the package signing key Here you accept AMD rules and trust them.
Add the AMD repositories (they set priority pining of packages) For debian i choose the Ubuntu 20.04 whose kernel + libc... are very close to debian
Update the list of packages
sudo apt update
Install
Install drivers like explained on the site
sudo apt install amdgpu-dkms amdgpu-dkms-firmware amdgpu-pro-core
If something goes wrong it should be now, at reboot only text mode ...
/!\ Install ROCm runtimes minimaly, not the full stuff they propose. (some packages have dependencies with higher version in ubuntu) /!\
apt install rocm-opencl rocm-ocl-icd ocl-icd-libopencl1-amdgpu-pro
apt install rocm-opencl-dev ocl-icd-libopencl1-amdgpu-pro-dev
In case i forgot something
apt search opencl|grep -A1 focal
will gives you the list of OpenCL related stuff from amd site (tagged focal if you choose 20.04)
Now check if it works (here you may see more than one device if you have several GPU, or pocl installed , or mesa-opencl) clinfo
$ clinfo|head -20
Number of platforms 1
Platform Name AMD Accelerated Parallel Processing
Platform Vendor Advanced Micro Devices, Inc.
Platform Version OpenCL 2.1 AMD-APP (3590.0)
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd cl_amd_event_callback
Platform Extensions function suffix AMD
Platform Host timer resolution 1ns
Platform Name AMD Accelerated Parallel Processing
Number of devices 1
Device Name gfx1030
Device Vendor Advanced Micro Devices, Inc.
Device Vendor ID 0x1002
Device Version OpenCL 2.0
Driver Version 3590.0 (HSA1.1,LC)
Device OpenCL C Version OpenCL C 2.0
Device Type GPU
Device Board Name (AMD) AMD Radeon RX 6800 XT
Device PCI-e ID (AMD) 0x73bf
clpeak
$ clpeak # less than one minute tiny benchmark
Platform: AMD Accelerated Parallel Processing
Device: gfx1030
Driver version : 3590.0 (HSA1.1,LC) (Linux x64)
Compute units : 36
Clock frequency : 2575 MHz
Global memory bandwidth (GBPS)
float : 458.18
Now test compile with OpenCL : The mighty very cool and simple test program download : https://github.com/hpc12/tools
unzip tools-master.zip
make
then run
$ ./print-devices
platform 0: vendor 'Advanced Micro Devices, Inc.'
device 0: 'gfx1030'
$ ./cl-demo
Choose platform:
[0] Advanced Micro Devices, Inc.
Enter choice:
Choose device:
[0] gfx1030
Enter choice:
---------------------------------------------------------------------
NAME: gfx1030
VENDOR: Advanced Micro Devices, Inc.
PROFILE: FULL_PROFILE
VERSION: OpenCL 2.0
EXTENSIONS: cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_glo...
OK, we have a working computer with openCL and compilers, libs ...
Katago : https://github.com/lightvector/KataGo/blob/master/Compiling.md
try, fail, reread the doc :-)
cmake . -DUSE_BACKEND=OPENCL # will use you default compiler
make -j4
To use rocm clang/llvm if you have installed them
CC=/opt/rocm-5.7.0/llvm/bin/clang-17 CXX=/opt/rocm-5.7.0/llvm/bin/clang++ cmake . -DUSE_BACKEND=OPENCL
make -j4
gcc and rocm/llvm/clang works for me, the stripped executable is 4M, and they both seem to work I have not benchmarked them, but my guess is no difference.
katago benchmark advice me to set 24-32-40 threads depending on my background workload.
As i have only one GPU, and i use it for my display, and the clinfo shows Max compute units: 36
, i
configured katago to 24 threads max.
# Number of threads to use in search
numSearchThreads = 24
My 2 cents.
In addition to OpenCL, CUDA and Eigen, AMD ROCm is flourishing nowadays. Maybe it can be added into KataGo's to-do list? Can this function solve the problems with AMD RX 5700 and OpenCL mesa mentioned in README?