harbecke / HexHex

AlphaGo Zero adaptation for Hex
GNU General Public License v3.0
25 stars 5 forks source link

Implement Monte Carlo Tree Search #12

Open PascalCremer opened 5 years ago

PascalCremer commented 5 years ago

One can implement MCTS based on the value model to improve model performance. This is not intended to be used during data generation due to the high cost of computation.

cleeff commented 5 years ago

I implemented a first MCTS version which can already me used in interactive mode. I thought about implementing this as a model wrapper but the interface doesn't quite match. Maybe we can find a way to merge the two interfaces s.t. MCTS can be used in comparison with models without MCTS without too much extra code.

c_puct Parameter certainly needs to be tuned. Also currently the prior policy is set to 1 uniformly. Softmax of the move evaluation might make more sense and should certainly be tried.

cleeff commented 5 years ago

I just added a sigmoid(model_output) factor to U. Otherwise the first move will be entirely random. Now it looks much better.