lightvector / KataGo

GTP engine and self-play learning in Go
https://katagotraining.org/
Other
3.46k stars 561 forks source link

Idea: preserve some DNN knowledge between the moves. #61

Open lukaszlew opened 5 years ago

lukaszlew commented 5 years ago

The simplest approach is to RNN through the game: preserve some signals from some subset of layers from move n and feed them as input to the NN at move n+1. This would give an opportunity to the NN to focus all the compute on the incremental computation move-to-move.
A more complex would be to have an LSTM cell.

lightvector commented 5 years ago

How would you integrate this with MCTS? If for example you have merely 32 channels (times 19 x 19) of preserved information per node in the search, that's a factor of 16 or 32 more CPU memory usage, as well as more than a doubling of the necessary data you have to shuttle to and from the GPU.

Training would also be massively more expensive too, since you would now need to be feeding in whole games to the neural net instead of random batches of positions, meaning that per "batch" now you only train towards one game result instead of training on a diverse shuffled batch with many results. Or is there a cheaper way to train RNNs?

lukaszlew commented 5 years ago

Indeed the storage requirements for MCTS seems large. If you have say 5000 playouts per second and create new MCTS node per playout (is it the case?) You'll have to allocate 50001919*32 ~ 60 MB/s - that does not sound so bad?

Also how about doing the playout from the root and recomputing the whole chain of predictions?

Could one keep these tensors on GPU and not move them back and forth?

To aid training there are several ideas:

One could precompute the input vector for all the games before shuffling and then do small batches of (2-8 moves).

Another one is training a bigger net (say size of the current net) for bootstrapping the process of chain-feeding 321919 channels on the first move in batch, and then training a small incremental net on the rest of the moves.

Perhaps initial experiments could be done with one small net and compared to a regular setup of similarly small size.

All that's pretty complex, but if you consider how similar could be computations of a neural network on consecutive moves, perhaps there is something to be gained there.

lukaszlew commented 4 years ago

Relevant: "Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model"

lightvector commented 4 years ago

MuZero seems like a fantastic advance for general RL, but is far less exciting for Go. I don't think it solves the issue of massive increase in memory usage - I bet Google just uses a machine with serious amounts of RAM and tolerates the problem - and their own paper shows that actually it doesn't scale to really deep searches, as the learned model loses too much accuracy.

The latter point is probably solvable by putting in the board information fresh each time rather than forcing the model to solve it. But with the memory requirement and the need to reengineer a decent amount of things, probably KataGo proper's not going to get to the point of trying this, it would need to be some fork or new project.