lightvector / KataGo

GTP engine and self-play learning in Go
https://katagotraining.org/
Other
3.47k stars 563 forks source link

Does version (1.6.1) work with CUDA 11.1? #328

Open mzjxfdtbigbear opened 3 years ago

mzjxfdtbigbear commented 3 years ago

Is there any problem of compatibility if using CUDA_11.1 backend? Is it better to compile the program locally than to use the release version?

lightvector commented 3 years ago

Yep. The most recent version of master should work with CUDA 11 if compiled locally. The latest released version does not yet support it, since the latest release is a bit older.

y-ich commented 3 years ago

@lightvector san,

I heard that a few Japanese professional Go players bought some of RTX 3000 series and failed to run KataGo on it. For example, see http://wakuwakuigonomura.seesaa.net/article/478140451.html. (sorry this article is in Japanese) Hoshikawa san, who is a professional Go player in Kansai Kiin, pointed out that KataGo with OpenCL on RTX 3080 runs slower than on RTX 2080 in his article.

For programmers, they will just compile the latest source of KataGo, but it must be hard for others to do it. I think that many people will appreciate you if you release the up-to-date binary for CUDA.

Thank you for your work!.

lightvector commented 3 years ago

Sure, I will do a release soon, perhaps this weekend.

However, OpenCL running slower is very strange. The exact same OpenCL code should at least not be worse on a better GPU. Have other people reported the same thing, or was it just an isolated incident?

y-ich commented 3 years ago

@lightvector san,

I am sorry that I wrote exaggerated information. I just saw Koshikawa san's article and a response by another professional on Twitter. Correctly I know only one professional who bought RX 3000 series. The number of sample is one. But the OpenCL speed issue which he pointed out is consistent with the comment in another issue (https://github.com/lightvector/KataGo/issues/323#issuecomment-703335221). (It also points out that the speed of RTX 3080 via OpenCL is same as the one of RTX 2070 via CUDA.)

So it may not be isolated one.

lightvector commented 3 years ago

Version 1.7.0 is released, with precompiled executables for CUDA 11!

https://github.com/lightvector/KataGo/releases

My ability to test these executables is quite limited - I do not personally own an RTX 3080, I don't have any windows GPU machine set up for suitable testing either right now. So please let me know if you run into any issues.

y-ich commented 3 years ago

I gathered information from issues.

V100(125 or 112 TensorTFLOPS, 16GB/32GB): 2300 positions per second. (https://github.com/lightvector/KataGo/issues/289) RTX 3080(119 TensorFLOPS, 10GB): 1100 or 1200 v/s. (https://github.com/lightvector/KataGo/issues/323#issuecomment-724117201)

May Ampere's Tensor Core not be as suitable for KataGo as Turing's?

UPDATE From other site, RTX 2080Ti(113.8 TensorFLOPS, 11GB): numSearchThreads = 48: 10 / 10 positions, visits/s = 1098.97 nnEvals/s = 820.28 nnBatches/s = 35.00 avgBatchSize = 23.43 (91.4 secs) (EloDiff +133)

It seems that V100 achieves special performance as cloud target maybe thanks to its memory, and 3080 and 2080Ti have similar performance about Tensor Core.