mokemokechicken / reversi-alpha-zero

Reversi reinforcement learning by AlphaGo Zero methods.
MIT License
677 stars 170 forks source link

What's different between Challenge 2 & 3? #32

Closed gooooloo closed 6 years ago

gooooloo commented 6 years ago

@mokemokechicken I see you started Challenge 3 (AlphaZero Method) in README, what is the difference between this one and Challenge2?

mokemokechicken commented 6 years ago

@gooooloo Challenge 3 is based on Challenge 2. I mainly try small simulation_num_per_move and max_file_num for fast improvement (to the GRhino LV3). And because there are many illegal moves in reversi, I think it is better that dirichlet noise to the root node in MCTS is applied only to legal moves, so I am trying.

gooooloo commented 6 years ago

I see. Thanks.