Open PascalCremer opened 5 years ago
I implemented a first MCTS version which can already me used in interactive mode. I thought about implementing this as a model wrapper but the interface doesn't quite match. Maybe we can find a way to merge the two interfaces s.t. MCTS can be used in comparison with models without MCTS without too much extra code.
c_puct
Parameter certainly needs to be tuned. Also currently the prior policy is set to 1
uniformly. Softmax of the move evaluation might make more sense and should certainly be tried.
I just added a sigmoid(model_output)
factor to U
. Otherwise the first move will be entirely random. Now it looks much better.
One can implement MCTS based on the value model to improve model performance. This is not intended to be used during data generation due to the high cost of computation.