lightvector / KataGo

GTP engine and self-play learning in Go
https://katagotraining.org/
Other
3.56k stars 564 forks source link

any plan to support RTX3000 series? #323

Open bclman opened 4 years ago

bclman commented 4 years ago

New rtx3000 series don't support CUDA10.2 any plan to release new version to support RTX3000 series?

portkata commented 4 years ago

Have you tried Cuda 10.1? It seems like that should work well with katago 1.4.2 for example: https://github.com/lightvector/KataGo/releases/download/v1.4.2/katago-v1.4.2-cuda10.1-windows-x64.zip

lightvector commented 4 years ago

Thanks to some work by @anoek, the tip of KataGo's github master branch in this repo already should be supporting CUDA 11 and CUDNN 8. If you know how to compile KataGo yourself, or you can ask someone else friendly who knows how to do it, then you should already be able to make it work. Otherwise, yes, you can wait for me to get around to next release, which I could certainly attempt to build and link against CUDA 11 this time. I would need to do some work upgrading my own CUDA installation first though.

l1t1 commented 4 years ago

https://www.khronos.org/opencl/ release opencl 3.0

thynson commented 4 years ago

I replaced my graphic card from RTX2070 to RTX3080 and tried to run KataGo v1.4.2, as @portkata mentioned above, but it does not works on my machine. It takes very long time, about 10 minutes, to start up (not the tuning process, but every times)! And just outputs bad outcomes: 无标题

lightvector commented 4 years ago

Hmmm, did NVIDIA's latest GPU break OpenCL support? That would be a bit surprising, given that OpenCL has worked on all the recent prior NVIDIA GPUs. Just to check that it isn't KataGo-specific, do you have the same issue if you try to run Leela Zero, which also uses OpenCL?

lightvector commented 4 years ago

And just to be clear - you're trying to use the OpenCL version, rather than the CUDA version that was compiled for CUDA 10.2, correct?

thynson commented 4 years ago

I was using CUDA version. And I just tried the OpenCL version, and confirmed that it works on RTX 3080.

lightvector commented 4 years ago

Great. So just use OpenCL. No need to use CUDA. :)

thynson commented 4 years ago

Despite it performs same speed as CUDA version on RTX 2070. 😞

lightvector commented 4 years ago

Ah. Okay in that case, if you want to try to see if CUDA is faster, you can either wait for the next release of KataGo in perhaps several weeks, or ask someone in https://discord.gg/bqkZAz3 or elsewhere to compile a KataGo version for your operating system that supports CUDA 11.

lightvector commented 4 years ago

Version 1.7.0 is released, with precompiled executables for CUDA 11! https://github.com/lightvector/KataGo/releases

You can try them! Keep in mind that there is still no guarantee that that there is an improvement. I found, somewhat unfortunately, that on one of my GPUs (on a linux cloud machine, however) that upgrading from CUDA 10.2 to CUDA 11 actually made things slower, not faster. If you can report your results however, that would be great. Let me know if it works or if you find a bug, or it crashes, or whatever. :)

thynson commented 4 years ago

I can just confirm that CUDA 11 version works. With an RTX 3080 card, the Cuda version now runs at a speed of 1100v/s, and OpenCL version runs at a speed of 1200v/s. While as far as I can remember, KataGo v1.4.2 OpenCL is about 700v/s, but I'm not sure if it's configured to run with fp32.

scaomath commented 3 years ago

I can just confirm that CUDA 11 version works. With an RTX 3080 card, the Cuda version now runs at a speed of 1100v/s, and OpenCL version runs at a speed of 1200v/s. While as far as I can remember, KataGo v1.4.2 OpenCL is about 700v/s, but I'm not sure if it's configured to run with fp32.

I am having similar issue of CUDA version performing slightly worse than openCL (for 40x256 net it is ~1700v/s vs ~1500v/s ). I compiled both locally. I wonder if this is normal.

lightvector commented 3 years ago

Maybe there's nothing wrong with the CUDA version? It's not inconceivable that the OpenCL code could be implemented well enough to be competitive with or beat it in some cases. Go neural nets are "weird" relative to most image processing - 3x3 convolutions on "images" that are only 19 pixels wide might not be a case that NVIDIA has optimized as much for.

Also, note that sometimes CUDA 11 can be slower than CUDA 10.2 on some gpus. (Googling online you'll find a few cases where people report this for various applications, including leela chess zero).

lightvector commented 3 years ago

Of course, I guess it is pretty weird if an RTX30 doesn't outperform the analogous RTX20 card. I'm not sure what I can do about it. KataGo's CUDA implementation isn't doing anything all that weird, I think - the vast majority of the compute time - all the convolutions - should be just calling out to Nvidia's CUDNN library, and the library handles it from there. Maybe there's some setting or something - I know CUDNN sometimes lets you configure different flags or algorithms.