mokemokechicken / reversi-alpha-zero

Reversi reinforcement learning by AlphaGo Zero methods.
MIT License
676 stars 169 forks source link

About the time of self-play #24

Open rtz19970824 opened 6 years ago

rtz19970824 commented 6 years ago

@mokemokechicken It seems that I can only finish one game with about 108s using the default hyper-parameters. My CPU is 8 core i7-7700K CPU @ 4.20GHz as well and my GPU is GeForce GTX 1080 Ti. Is there any hyper-parameters changed without any declaration in README? Thanks!

mokemokechicken commented 6 years ago

@rtz19970824

Oh, sorry. I changed default hyper-parameters from the time I wrote readme at. It takes about 100~200 seconds to play one self-play in my environment now.

JialianLee commented 6 years ago

@mokemokechicken In README, it is said that only 10 to 20 seconds are needed for a self-play game? That's very impressive. I wanna know whether this speedup comes from turning simulation_num_per_move to 100? I haven't done a test yet.

mokemokechicken commented 6 years ago

@JialianLee

Yes, after fixing virtual loss bug, it takes 10~20 seconds for a self-play game when simulation_num_per_move=100.