Open bobqianic opened 3 months ago
Are you familiar with AlphaZero, or Expert Iteration, or similar methods? The original papers are pretty good background reading if you're not: https://arxiv.org/pdf/1705.08439.pdf https://discovery.ucl.ac.uk/id/eprint/10045895/1/agz_unformatted_nature.pdf
The main thing about these methods is that almost all the compute cost is from the self-play portion, using search as the policy improvement mechanism. So that's the part that needs to be distributed. For training the model using that data, you don't need as much compute power, right now a single machine using only one (strong) GPU is enough to keep up.
I found that KataGo conducts self-play and then generates a large number of rows, which are then uploaded. What are these data used for? Because it doesn't seem to be doing backward propagation like training a typical model, but rather continuously generating data.