tensorflow / minigo

An open-source implementation of the AlphaGoZero algorithm
Apache License 2.0
3.47k stars 561 forks source link

"Rough Gating" work required #459

Open amj opened 6 years ago

amj commented 6 years ago

Gating work checklist

  1. New "schema" for location of data in bucket
  2. Changes to example_buffer / data assembly/preprocessing
  3. Changes to c++ "new model" code.
  4. Changes to c++ flag reloading w.r.t. resign threshold.
  5. Change name generation code to not make the same name twice.

Biggest design question: Keep with the GCS-flags method, or bite the bullet and do a proper server to serve out selfplay configs.

Any GCS solution will have "spillover", where some set of selfplay workers get flags that are obsolete. I.e., we want 25k games with Model X, and we have 24.9k finished. We'll have about N Overall, the process of having the resign disabled games played upfront (to compute the proper threshold) has a couple of consequences. These games are expensive and we want to play as few of them as possible, and we don't want to have more than 10% of them.

Any server based solution will also have to deal with the fact that our TPU workers are not currently able to split games between different models. But the slop will be less.

New data arrangement:

Old:

-   /data/selfplay/YYYY-MM-DD-HH
-   /sgf/{clean,full}/YYYY-MM-DD-HH
-   /sgf/eval/YYYY-MM-DD-HH

New:

-   /data/selfplay/$MODEL/resign_enabled
-   /data/selfplay/$MODEL/resign_disabled
-   /sgf/{clean,full,resign_disabled}/$MODEL
-   /sgf/eval/$MODEL  <-- games used to determine promo/no-promo

Changes to data marshalling

Obviously, now need to look in /data/selfplay/$MODEL vs the other way. Otherwise, same deque design to keep the most recent 500k, plus our rsync logic gets simpler. Depending on how we do resign-disable & resign-threshold computation, potentially limit the resign-threshold games to 2500 (because some resign-disable games could come in after the fact)

Changes to "new model" code

When a new model comes in, we don't want to reload it. So, how to spin down? Tom's first idea is just: kill the games in progress when a new model comes up. On average, you lose 0.5 * parallel_games * num_workers, which is kind of painful.

Changes to flag reloading

Flags can be updated with the new resign-threshold. Similar to new_model, we can either a) kill the resign_disabled games (easy) or b) retroactively apply the newly computed resign threshold.

Pseudocode for "commander"/"dispatcher"

while {
if in SELFPLAY state
  Check for # of resign disabled games played
    -> If 10%, rsync, parse, and set resign threshold.

  Check for new models
     if so, send for evaluation, enter PENDING EVAL state

if in PENDING_EVAL state
  Check for pending evaluation jobs
   - if sufficient completed eval games AND sufficient selfplay games, PROMOTE accordingly
   - After promotion, reset flags for resign disabled as needed
}
tommadams commented 6 years ago

WRT flag files & config:

We should probably add some kind of proper generic Options object that encapsulates the flags, replacing MctsPlayer::Options:


struct Options {
  float resign_threshold;
  bool inject_noise;
  bool soft_pick;
  bool random_symmetry;
  // etc...
};

// Provides threadsafe access to the options.
// Runs a background thread that periodically loads new options from the backend source,
// e.g. GCS flagfile, AppEngine app. The refresh rate would be fairly low, e.g. 30 seconds.
class OptionsProvider {
  // Threadsafe. Create a new options struct using the current configuration.
  virtual Options NewOptions() = 0;

  // Threadsafe. Updates the given options with the latest configuration.
  // Called periodically from the player, potentially at a higher rate that the refresh rate,
  // e.g. before each SuggestMove call.
  // No-op if the options haven't been refreshed since the last call.
  virtual RefreshOptions(Options* options) = 0;
};
sethtroisi commented 6 years ago

WRT to new data: /data/selfplay/$MODEL /data/selfplay/$MODEL/resign_disabled

I'd suggest /data/selfplay/$MODEL/resign_enabled /data/selfplay/$MODEL/resign_disabled so that all files in a directory are the same type (e.g. all sgfs with no folders) and so that it's easy to find the folder

amj commented 6 years ago

@sethtroisi ok, sounds good. Editing.

sethtroisi commented 6 years ago

pseudo code isn't complete for "dispatcher"

if in PENDING_EVAL state
  Check for pending evaluation jobs
   - if sufficient completed eval games AND sufficient selfplay games:
      -if won > X% (40%? 45%? 50%?): PROMOTE
        -move to model folder
       -reset resign disabled flag
     -else:
       Copy current model with model num incremented (so we can track promotion rate easily):
       SOMETHING COMPLICATED SO THAT TPU WORKS WRITE TO NEW DIR BUT DON'T RELOAD OR RESET GAMES 
amj commented 6 years ago

@sethtroisi I hear what you're saying. At the moment, i'm leaning towards simplicity of implementation over maximizing optimization of resources. What do you think?

amj commented 6 years ago

Current workflow plan:

End of "eval" triggers:

End of "training" triggers:

End of 'selfplay, resign_disabled' triggers:

End of 'selfplay, resign_enabled' triggers:

End of 'gather' triggers:

sethtroisi commented 6 years ago

Couple of comments

on new data arrangement I think it will help me on cloudygo (which has to support older runs) to use

-   /sgf/promo/$MODEL  <-- games used to determine promo/no-promo

Instead of

-   /sgf/eval/$MODEL  <-- games used to determine promo/no-promo

"Commander" should start rsync at around 8% so that it can quickly calculate correct resign disabled threshold.