about the value network

Yes that is correct, I use model free reinforcement learning for training the value network i.e. self play games are run and statistics (value and policy through MCTS. Current neural net is used to provide prior over actions and value at leaf nodes) at each timestep of the game trajectory are stored. These statistics are then used to train the network. The self play for generating data and training are alternated so that the network learns and the generated data becomes more and more efficient over time. Hope this helps.

rvteja24 / reconChessAgent

about the value network #1