lightvector / KataGo

GTP engine and self-play learning in Go
https://katagotraining.org/
Other
3.47k stars 563 forks source link

How to optimize TensorRT for Katago training contribution? #572

Open yauwing opened 2 years ago

yauwing commented 2 years ago

I upgraded to TensorRT and is happy that Katago is now 60+% faster on my computer.

However, when I try to contribute on katagotraining.org I don't see the same jump in training performance - actually it ends up slightly slower because TensorRT takes a longer time to initialize.

Is there anything I can do to improve the training performance?

petgo3 commented 2 years ago

As far as I understood things, TensorRT is mainly faster is when requireMaxBoardSize = true in config is set. This is (atm) not possible for training, because different board sizes are used.

@tychota, @lightvector Might it be possible to distribute boardsize with tasks from server, so to have fixed board size per distributer?

ceremony08 commented 2 years ago

I upgraded to TensorRT and is happy that Katago is now 60+% faster on my computer.

However, when I try to contribute on katagotraining.org I don't see the same jump in training performance - actually it ends up slightly slower because TensorRT takes a longer time to initialize.

Is there anything I can do to improve the training performance?

what is your configs settings, TRT may be just 20% faster on my pc

yauwing commented 2 years ago

My computer hardware config is AMD Ryzen 5950, 64G RAM, RTX3080ti + RTX2080 Config file was generated using the katago genconfig command Since config file generated is a bit long, I only list config settings difference from default_gtp.cfg below: numSearchThreads = 96 nnCacheSizePowerOfTwo = 21 nnMutexPoolSizePowerOfTwo = 17 numNNServerThreadsPerModel = 2 trtDeviceToUseThread0 = 0 trtDeviceToUseThread1 = 1

ceremony08 commented 2 years ago

My computer hardware config is AMD Ryzen 5950, 64G RAM, RTX3080ti + RTX2080 Config file was generated using the katago genconfig command Since config file generated is a bit long, I only list config settings difference from default_gtp.cfg below: numSearchThreads = 96 nnCacheSizePowerOfTwo = 21 nnMutexPoolSizePowerOfTwo = 17 numNNServerThreadsPerModel = 2 trtDeviceToUseThread0 = 0 trtDeviceToUseThread1 = 1

:)