lightvector / KataGo

GTP engine and self-play learning in Go
https://katagotraining.org/
Other
3.45k stars 561 forks source link

Question about tensor cores and OpenCL/Cuda support #330

Closed w3333 closed 3 years ago

w3333 commented 3 years ago

Hello,

can someone tell me if there is a difference in tensor core support between the OpenCL and CUDA versions of KataGo? TCs seem to make a huge difference in performance, so I'm looking to get the right GPU and drivers. Since CUDA seems to be a bit tricky on Linux, if the OpenCL version (that is running fine) also supports TCs then there's no need to switch. I read somewhere (here: https://github.com/pytorch/glow/issues/3949) that it's actually possible to use TC in OpenCl and that LeelaZero does that, so that's why I'm asking. It seems normally OpenCl will not see the TCs, unless it's "hacked" into it, like explained in the link.

Friday9i commented 3 years ago

Yeah Tensor Cores are now used efficiently with OpenCL, performance is about comparable with Cuda (sometimes a bit better or worse but nothing dramatic). So you can stay with OpenCL with KataGo 1.6.1

w3333 commented 3 years ago

That is great news! Really glad that this is working! Will get a TC-GPU then asap :-))

Friday9i commented 3 years ago

Yep! Currently the best deal seems to be RTX3080, but if you manage to catch one, you are lucky! RTX2060 or 2070 is quite good too

w3333 commented 3 years ago

Thanks for the info! Looking right now :)

May I ask you (since you seem to be involved in Kata dev): is the lag issue with Nvidia cards during gpu-compute still an issue? It used to be that while computing on the gpu, the gui became very laggy, to the point of almost unusable. I checked with gcp from Leela about that and he said he knows about it, something about Nvidia only having one scheduler while AMD has more?! Ever since then I switched to AMD cards cause there was no lag. Would be nice to hear that this issue is no longer there...

Friday9i commented 3 years ago

You are welcome. I'm not involved in the dev (I barely managed to compiled it, no way I can help in the dev : -). But I'm following the project closely and also training nets for sometime. Regarding the lag, I didn't really notice it, so not in a position to comment, sorry. Regarding GPUs, I really like AMD but up to now for KataGo, Nvidia is way better: works well and fast. That may change with new AMD GPUs announced on October 28th, but no guarantee at all, we'll see.

w3333 commented 3 years ago

Well thanks again and if you didn't notice the lag than it is probably not an issue anymore (trust me, it was bad). You're right of course, would be wise to wait for AMDs new GPUs, prices may go down for older cards. But from what I read AMD is not doing much on the tensor core front, so not much hope in that regard...