Closed Fool-Yang closed 4 years ago
The only thing left is to complete the code in agent.py, train.py and maybe implement a MCTS as well.
I realize it is stupid to do the preprocessing from scratch on every single board, as the board is dependent on the previous board, there is no reason to discard that information. I will rewrite the game.py to generate the state instead.
Done
I used the algorithm Alphago Zero used to train the network. nature24270.pdf Simple Alpha Zero.pdf