rvteja24 / reconChessAgent

1 stars 0 forks source link

about the value network #1

Open daihuiao opened 2 years ago

daihuiao commented 2 years ago

thank you for your work ,i wanna know how do you train you value network ? using reinforcement learning ?

rvteja24 commented 2 years ago

Yes that is correct, I use model free reinforcement learning for training the value network i.e. self play games are run and statistics (value and policy through MCTS. Current neural net is used to provide prior over actions and value at leaf nodes) at each timestep of the game trajectory are stored. These statistics are then used to train the network. The self play for generating data and training are alternated so that the network learns and the generated data becomes more and more efficient over time. Hope this helps.