Closed kblomdahl closed 6 years ago
Updated with the following command-line options:
Usage: ./dream-go [options]
--extract <files...> Extract a dataset for training from the given SGF files
--ex-it When combined with --dataset perform search on any partial policies
--self-play <n> Extract a dataset from self-play containing n examples
--policy-play <n> Extract a dataset from self-play using only the policy network
--gtp Run GTP client (default)
Advanced options:
--num-rollout <n> The number of rollouts to add to the search tree for every move
--num-games <n> The number of games to play or extract in parallel
--num-threads <n> The number of search threads to use in total
--num-samples <n> The number of games to extract from each game record
--batch-size <n> The number parallel rollouts to perform on the GPU
There are also four hidden environment variables for controlling internal constants. I could inline them into the code, but it is useful to have them controllable from the outside because of stuff like CLOP:
DIRICHLET_NOISE
- the mixing constant for how much dirichlet noise should be added to the root node.TEMPERATURE
- the temperature used when determining the first eight moves of the game.UCT_EXP
- the exploration rate constant used in UCT.RAVE_BIAS
- the RAVE bias.
Currently all input to
dream_go
is controlled from environment variables, this is not terrible but to behave more as people would expect we should read configuration from the command line instead. To accomplish this I suggest we move theParam
structure that is currently in themcts
package toutil
and then generalise some of the parameters to read from the command line instead of the environment variables.It is also unclear whether we need to keep the
Param
object as a trait since if we are reading from the command-line then we can branch directly on whether we are in tournament or self-play mode (which currently decides on how much Dirichlet noise we as well as some other factors).