Open amj opened 6 years ago
WRT flag files & config:
We should probably add some kind of proper generic Options object that encapsulates the flags, replacing MctsPlayer::Options:
struct Options {
float resign_threshold;
bool inject_noise;
bool soft_pick;
bool random_symmetry;
// etc...
};
// Provides threadsafe access to the options.
// Runs a background thread that periodically loads new options from the backend source,
// e.g. GCS flagfile, AppEngine app. The refresh rate would be fairly low, e.g. 30 seconds.
class OptionsProvider {
// Threadsafe. Create a new options struct using the current configuration.
virtual Options NewOptions() = 0;
// Threadsafe. Updates the given options with the latest configuration.
// Called periodically from the player, potentially at a higher rate that the refresh rate,
// e.g. before each SuggestMove call.
// No-op if the options haven't been refreshed since the last call.
virtual RefreshOptions(Options* options) = 0;
};
WRT to new data: /data/selfplay/$MODEL /data/selfplay/$MODEL/resign_disabled
I'd suggest /data/selfplay/$MODEL/resign_enabled /data/selfplay/$MODEL/resign_disabled so that all files in a directory are the same type (e.g. all sgfs with no folders) and so that it's easy to find the folder
@sethtroisi ok, sounds good. Editing.
pseudo code isn't complete for "dispatcher"
if in PENDING_EVAL state
Check for pending evaluation jobs
- if sufficient completed eval games AND sufficient selfplay games:
-if won > X% (40%? 45%? 50%?): PROMOTE
-move to model folder
-reset resign disabled flag
-else:
Copy current model with model num incremented (so we can track promotion rate easily):
SOMETHING COMPLICATED SO THAT TPU WORKS WRITE TO NEW DIR BUT DON'T RELOAD OR RESET GAMES
@sethtroisi I hear what you're saying. At the moment, i'm leaning towards simplicity of implementation over maximizing optimization of resources. What do you think?
Current workflow plan:
End of "eval" triggers:
End of "training" triggers:
End of 'selfplay, resign_disabled' triggers:
End of 'selfplay, resign_enabled' triggers:
End of 'gather' triggers:
Couple of comments
on new data arrangement I think it will help me on cloudygo (which has to support older runs) to use
- /sgf/promo/$MODEL <-- games used to determine promo/no-promo
Instead of
- /sgf/eval/$MODEL <-- games used to determine promo/no-promo
"Commander" should start rsync at around 8% so that it can quickly calculate correct resign disabled threshold.
Gating work checklist
Biggest design question: Keep with the GCS-flags method, or bite the bullet and do a proper server to serve out selfplay configs.
Any GCS solution will have "spillover", where some set of selfplay workers get flags that are obsolete. I.e., we want 25k games with Model X, and we have 24.9k finished. We'll have about N Overall, the process of having the resign disabled games played upfront (to compute the proper threshold) has a couple of consequences. These games are expensive and we want to play as few of them as possible, and we don't want to have more than 10% of them.
Any server based solution will also have to deal with the fact that our TPU workers are not currently able to split games between different models. But the slop will be less.
New data arrangement:
Old:
New:
Changes to data marshalling
Obviously, now need to look in /data/selfplay/$MODEL vs the other way. Otherwise, same deque design to keep the most recent 500k, plus our rsync logic gets simpler. Depending on how we do resign-disable & resign-threshold computation, potentially limit the resign-threshold games to 2500 (because some resign-disable games could come in after the fact)
Changes to "new model" code
When a new model comes in, we don't want to reload it. So, how to spin down? Tom's first idea is just: kill the games in progress when a new model comes up. On average, you lose
0.5 * parallel_games * num_workers
, which is kind of painful.Changes to flag reloading
Flags can be updated with the new resign-threshold. Similar to new_model, we can either a) kill the resign_disabled games (easy) or b) retroactively apply the newly computed resign threshold.
Pseudocode for "commander"/"dispatcher"