which finally answered for me what was bugging me for some time - how they actually got a neural net to play go/chess/shogi without any human input. I thought for sure it would follow previous nets and the system would get stuck on a local maxima fairly quickly.
the trick? basically use monte carlo to generate the training data - ie: the neural net + monte carlo would pick the moves, and then be used by that neural network as training data.
then the next generation of the neural network would be inherently stronger, which would then use monte carlo to learn further, and so on.
in essence, they were baking the results of the monte carlo trials into the neural net itself. Exceedingly clever.
In any case, I was wondering if anyone had implemented a 2048 bot using this strategy, and if so, what were the results.
Yes, it's a very clever approach. I think it should be able to produce better results than this bot, given enough training time. I have not implemented such a thing - it sounds like a fun project!
all,
I watched a fascinating video about deepmind's alphazero implementation here:
https://www.youtube.com/watch?v=Wujy7OzvdJk
which finally answered for me what was bugging me for some time - how they actually got a neural net to play go/chess/shogi without any human input. I thought for sure it would follow previous nets and the system would get stuck on a local maxima fairly quickly.
the trick? basically use monte carlo to generate the training data - ie: the neural net + monte carlo would pick the moves, and then be used by that neural network as training data.
then the next generation of the neural network would be inherently stronger, which would then use monte carlo to learn further, and so on.
in essence, they were baking the results of the monte carlo trials into the neural net itself. Exceedingly clever.
In any case, I was wondering if anyone had implemented a 2048 bot using this strategy, and if so, what were the results.