mokemokechicken / reversi-alpha-zero

Reversi reinforcement learning by AlphaGo Zero methods.
MIT License
677 stars 170 forks source link

maybe a bug here #51

Closed gooooloo closed 6 years ago

gooooloo commented 6 years ago

https://github.com/mokemokechicken/reversi-alpha-zero/blob/4ac6e064d653ddc2153a1bcd9b29b3591b85d9b7/src/reversi_zero/agent/player.py#L406-L414

https://github.com/mokemokechicken/reversi-alpha-zero/blob/4ac6e064d653ddc2153a1bcd9b29b3591b85d9b7/src/reversi_zero/agent/player.py#L420

Maybe a bug here: p_ here is NOT a probability distribution over legal moves, until you do normalization in codes after. But in the dirichlet_noise_only_for_legal_moves == True case, dirichlet noise is already a probability distribution over legal moves. Saying, you are adding dirichlet noise on a non-probability-distribution, which I believe not consistent with AlphaGoZero paper.

I happen to find my implementation had this bug too, and after I fixed this bug, my AI's strength improves significantly.

mokemokechicken commented 6 years ago

@gooooloo

Thank you for your pointed out. It certainly may be better that in the dirichlet_noise_only_for_legal_moves == True case, the noise is added after normalization.

And the dirichlet_noise_only_for_legal_moves == False case may be not necessary.

I'll fix it.