lightvector / KataGo

GTP engine and self-play learning in Go
https://katagotraining.org/
Other
3.33k stars 546 forks source link

How is distributed training implemented in KataGo? #915

Open bobqianic opened 3 months ago

bobqianic commented 3 months ago

I found that KataGo conducts self-play and then generates a large number of rows, which are then uploaded. What are these data used for? Because it doesn't seem to be doing backward propagation like training a typical model, but rather continuously generating data.

lightvector commented 3 months ago

Are you familiar with AlphaZero, or Expert Iteration, or similar methods? The original papers are pretty good background reading if you're not: https://arxiv.org/pdf/1705.08439.pdf https://discovery.ucl.ac.uk/id/eprint/10045895/1/agz_unformatted_nature.pdf

The main thing about these methods is that almost all the compute cost is from the self-play portion, using search as the policy improvement mechanism. So that's the part that needs to be distributed. For training the model using that data, you don't need as much compute power, right now a single machine using only one (strong) GPU is enough to keep up.