pytorch / ELF

ELF: a platform for game research with AlphaGoZero/AlphaZero reimplementation
Other
3.37k stars 566 forks source link

--mcts_puct tuning #53

Closed killerducky closed 6 years ago

killerducky commented 6 years ago

How did you tune the --mcts_puct values? Is it true different values are used for generating self-play games for training vs match play?

I think self-play for training uses --mcts_puct 0.85 https://github.com/pytorch/ELF/blob/113aba73ec0bc9d60bdb00b3c439bc60fecabc89/scripts/elfgames/go/start_client.sh#L17

And match play uses --mcts_puct 1.50 https://github.com/pytorch/ELF/blob/a4edc96e8bf94aa1a84134431ce3758a6ade27c7/README.rst#running-a-go-bot

Edit: BTW I think this is the relevant part of the AGZ paper:

AlphaGo Zero tuned the hyper-parameter of its search by Bayesian optimisation. In AlphaZero we reuse the same hyper-parameters for all games without game-specific tuning."

It doesn't really clarify if this tuning is done for self-play only, or something more expensive involving the entire training feedback loop.

alreadydone commented 6 years ago

https://github.com/pytorch/ELF/blob/e3f407226056da9c8a1861cd25e9dbf9dac0d62e/scripts/elfgames/go/start_selfplay.sh#L39 Maybe selfplay uses 1.5 as well.

Some comparisons here: https://www.reddit.com/r/cbaduk/comments/8j5x3w/first_play_urgency_fpu_parameter_in_alpha_zero/dz1ipi7/ 1.5/2=0.75 is close to what LZ uses (0.8).

jma127 commented 6 years ago

Hi @killerducky, 1.50 is used everywhere. (the script included in this repository has some of our older training parameter values)