lightvector / KataGo

GTP engine and self-play learning in Go
https://katagotraining.org/
Other
3.47k stars 563 forks source link

Question: Higher speed at 9x9 #600

Open robin-nilsson opened 2 years ago

robin-nilsson commented 2 years ago

I have a hobby project which is using KataGo's analysis functionality to analyse the middle game of 9x9 games, and I just bought a new computer with two modern GPUs just for this. I want to run a great number of playouts (perhaps something like 10M per game) and am looking for any advice to speed up KataGo at 9x9 size.

If I only look at the middle game / end game and run a lot of playouts, could the move score estimation be reliable even with smaller networks? (That would in that case seem like a plausible way to speed up execution)

Another thing I've been thinking about: Is it possible to find really strong networks trained only on 9x9? It feels as if that could be more suitable for 9x9 analysis than a network mainly trained on 19x19.

(I have experimented a bit with the configuration file settings and will try out TensorRT)

robin-nilsson commented 2 years ago

I will share my experience so far, in case it might help someone else.

I have been trying out different modes, settings, models, etc on a machine with Ryzen 5 12-thread CPU, 16 GB ram, one RTX 3060 OC 12GB and one RTX 3060 Ti OC 8GB.

TensorRT: Using TensorRT seemed to make the largest difference in speed, although I only compared it with OpenCL. Installing the dependencies by following instructions on the KataGo release page and on Nvidias homepage was do-able for me as a software developer and experienced Linux user. For anyone trying to set up TensorRT for themselves, I would strongly recommend to follow Nvidia's installation instructions for each component. (They have pages with detailed lists of all dependencies that needs to be installed/met for each specific OS.)

Network: I downloaded and experimented with the latest version of each size from katagotraining.org. I mention my experiences with 6b and the best 40b as they were the extremes.

6b: When using the 6b network, the computer used nearly 100% of all 12 CPU cores and only between 10-35% of each GPU. The speed was higher (peaking at 115.000 playouts per second according to Lizzie) but the accuracy of move score estimation was surprisingly low. (As an example, it could give 17 points for a move around move 40, which when played out with several different networks gave a 5 point end result. The 40b network gave 5.3 points estimation for the same move and situation)

40b: When running the 40b network, the computer used much less CPU, (About 50%) At the same time the GPUs were working much more efficient, at between 70-90% each. According to Lizzie it at peaked around 70.000 playouts per second. The accuracy using this network was much higher, and the score given at a move tended to be very similar to the end score after playing out the game with around 1M playouts per step.

For now, it seems like a no-brainer to use the strongest 40b network. The katago benchmark suggested using 128 threads for a single GPU, so I am using 256 for both, which seems to work well.

The numbers are for running on 9x9 only.

Friday9i commented 2 years ago

Hi! A few comments:

Anyway, enjoy!

robin-nilsson commented 2 years ago

Hello Friday9i,

Thank you for the reply!

"you can contribute to KataGo" - Well, I have bought this computer only for running my project utilizing KataGo, and if it's successful it would make a whole lot of sense for me to do some KataGo training. I will see if I can spare GPU and electricity in the future. :)

I will definitely try out the 60b network. However, from what I've seen using the 40b, there is no need for anything stronger that that. Also, my guess is that the average user of KataGo on a normal computer (for reviewing their games or similar) are using 20b or 40b, as the 60b is notably heavier. My spontaneous feeling is that I want to contribute to training the strongest 40b or even a 20b if I do find time to contribute.

Thank you again for the input. I am also still interested if anyone would have advice on how to gain speed on 9x9. Perhaps on how to tweak the settings for analysis mode or similar. As I will be doing so many playouts in my project, any improvement will be a time gain.

Friday9i commented 2 years ago

The fastest config is probably TRT (specifically set-up on 9x9)