kblomdahl / dream-go

Artificial go player based on reinforcement and supervised learning
Apache License 2.0
47 stars 8 forks source link

Move the `Param` object out of the `mcts` package and make it read from the command line #18

Closed kblomdahl closed 6 years ago

kblomdahl commented 6 years ago

Currently all input to dream_go is controlled from environment variables, this is not terrible but to behave more as people would expect we should read configuration from the command line instead. To accomplish this I suggest we move the Param structure that is currently in the mcts package to util and then generalise some of the parameters to read from the command line instead of the environment variables.

It is also unclear whether we need to keep the Param object as a trait since if we are reading from the command-line then we can branch directly on whether we are in tournament or self-play mode (which currently decides on how much Dirichlet noise we as well as some other factors).

kblomdahl commented 6 years ago

Updated with the following command-line options:

Usage: ./dream-go [options]

  --extract <files...>  Extract a dataset for training from the given SGF files
  --ex-it               When combined with --dataset perform search on any partial policies
  --self-play <n>       Extract a dataset from self-play containing n examples
  --policy-play <n>     Extract a dataset from self-play using only the policy network
  --gtp                 Run GTP client (default)

Advanced options:
  --num-rollout <n>     The number of rollouts to add to the search tree for every move
  --num-games <n>       The number of games to play or extract in parallel
  --num-threads <n>     The number of search threads to use in total
  --num-samples <n>     The number of games to extract from each game record
  --batch-size <n>      The number parallel rollouts to perform on the GPU

There are also four hidden environment variables for controlling internal constants. I could inline them into the code, but it is useful to have them controllable from the outside because of stuff like CLOP: