lightvector / KataGo

GTP engine and self-play learning in Go
https://katagotraining.org/
Other
3.56k stars 564 forks source link

AMD ROCm backend support #331

Open fpemud opened 4 years ago

fpemud commented 4 years ago

In addition to OpenCL, CUDA and Eigen, AMD ROCm is flourishing nowadays. Maybe it can be added into KataGo's to-do list? Can this function solve the problems with AMD RX 5700 and OpenCL mesa mentioned in README?

LZY2006 commented 4 years ago

Only OpenCL and CUDA have implementations right now. If you'd interested in doing the work to write another backend, that would be cool. Be warned that normally this is quite a lot of work - GPU code is tricky.

Basically you have to implement this interface: https://github.com/lightvector/KataGo/blob/master/cpp/neuralnet/nninterface.h

You can look at cudabackend.cpp and openclbackend.cpp for the existing implementations.

isty2e commented 4 years ago

Something like ONNXRuntime backend might be worth implementing, provided that its speed is reasonable.

alain-bkr commented 1 year ago

Oct 2023 : I have an AMD rx6800 XT, with debian + rocm5.7.0 and i can compile and run Katago without problems, just by following the compile guide.

I removed mesa-opencl which does not work at all (1) I use xorg + amdgpu proprietary driver, and rocm donwloaded from their site.

i have only one GPU which is used simultaneously for my display, for openCL-rocm Katago bot in backgrouund, and i even tried an old 3D game under wine simultanesouly. I saw no problem, no messages in kernel logs...

(1) WIP mesa 23.3 https://www.phoronix.com/news/Mesa-23.3-Rusticl-On-Zink.

Looong01 commented 1 year ago

Could u please share ur compile guide? Thank you very much!

alain-bkr commented 1 year ago

Before going further, rocm5.2 is packaged in debian i don't remeber why i do not have just this from debian, and took from amd site. Maybe just apt search amdgpu and install the corresponding driver, and firmware apt install rocm ocl-icd-libopencl ocl-icd-dev ocl-icd-opencl-dev

In case you have mostly nothing installed for compiling, apt-get build-dep leela-zero should install most of i what is needed to compile katago


To install Rocm + graphic driver from AMD site, the following worked for me for RX6800XT on debian 12, i have no idea if it work for 5700.

On debian use the amdgpu xserver-xorg-video-amdgpu

(the nonfree firmware will be downloaded from AMD site later, but it may be wise to ensure you have a correct display before going further, and install the firmware from debian first, to start from a clean working state : https://unix.stackexchange.com/questions/736065/how-do-i-install-non-free-firmware-in-debian-12-bookworm )

You can remove other openCL-icd packages from system

apt remove mesa-opencl-icd  # does not work for me (and others)
apt remove pocl-opencl-icd  # may be use to do OpenCL on cpu, it is seen a an OpenCL device.. no need

AMD site : https://rocmdocs.amd.com/en/latest/deploy/linux/quick_start.html

There is an amdgpu-installer but it is for ubuntu, and points towards ubuntu repo, so i used the other method and followed the 4 easy steps.

  1. Download and convert the package signing key Here you accept AMD rules and trust them.

  2. Add the AMD repositories (they set priority pining of packages) For debian i choose the Ubuntu 20.04 whose kernel + libc... are very close to debian

  3. Update the list of packages sudo apt update

  4. Install Install drivers like explained on the site sudo apt install amdgpu-dkms amdgpu-dkms-firmware amdgpu-pro-core

If something goes wrong it should be now, at reboot only text mode ...

/!\ Install ROCm runtimes minimaly, not the full stuff they propose. (some packages have dependencies with higher version in ubuntu) /!\

apt install rocm-opencl rocm-ocl-icd ocl-icd-libopencl1-amdgpu-pro
apt install rocm-opencl-dev ocl-icd-libopencl1-amdgpu-pro-dev 

In case i forgot something apt search opencl|grep -A1 focal will gives you the list of OpenCL related stuff from amd site (tagged focal if you choose 20.04)


Now check if it works (here you may see more than one device if you have several GPU, or pocl installed , or mesa-opencl) clinfo

$ clinfo|head -20
Number of platforms                               1
  Platform Name                                   AMD Accelerated Parallel Processing
  Platform Vendor                                 Advanced Micro Devices, Inc.
  Platform Version                                OpenCL 2.1 AMD-APP (3590.0)
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd cl_amd_event_callback 
  Platform Extensions function suffix             AMD
  Platform Host timer resolution                  1ns

  Platform Name                                   AMD Accelerated Parallel Processing
Number of devices                                 1
  Device Name                                     gfx1030
  Device Vendor                                   Advanced Micro Devices, Inc.
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 2.0 
  Driver Version                                  3590.0 (HSA1.1,LC)
  Device OpenCL C Version                         OpenCL C 2.0 
  Device Type                                     GPU
  Device Board Name (AMD)                         AMD Radeon RX 6800 XT
  Device PCI-e ID (AMD)                           0x73bf

clpeak

$ clpeak   # less than one minute tiny benchmark

Platform: AMD Accelerated Parallel Processing
  Device: gfx1030
    Driver version  : 3590.0 (HSA1.1,LC) (Linux x64)
    Compute units   : 36
    Clock frequency : 2575 MHz

    Global memory bandwidth (GBPS)
      float   : 458.18

Now test compile with OpenCL : The mighty very cool and simple test program download : https://github.com/hpc12/tools

unzip tools-master.zip
make 

then run

$ ./print-devices
platform 0: vendor 'Advanced Micro Devices, Inc.'
  device 0: 'gfx1030'
$ ./cl-demo
Choose platform:
[0] Advanced Micro Devices, Inc.
Enter choice: 
Choose device:
[0] gfx1030
Enter choice: 
---------------------------------------------------------------------
NAME: gfx1030
VENDOR: Advanced Micro Devices, Inc.
PROFILE: FULL_PROFILE
VERSION: OpenCL 2.0 
EXTENSIONS: cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_glo...

OK, we have a working computer with openCL and compilers, libs ...


Katago : https://github.com/lightvector/KataGo/blob/master/Compiling.md

try, fail, reread the doc :-)

cmake . -DUSE_BACKEND=OPENCL   # will use you default compiler
make -j4

To use rocm clang/llvm if you have installed them

CC=/opt/rocm-5.7.0/llvm/bin/clang-17 CXX=/opt/rocm-5.7.0/llvm/bin/clang++ cmake . -DUSE_BACKEND=OPENCL
make -j4

gcc and rocm/llvm/clang works for me, the stripped executable is 4M, and they both seem to work I have not benchmarked them, but my guess is no difference.


katago benchmark advice me to set 24-32-40 threads depending on my background workload. As i have only one GPU, and i use it for my display, and the clinfo shows Max compute units: 36, i configured katago to 24 threads max.

# Number of threads to use in search
numSearchThreads = 24

My 2 cents.