Closed RobertTLange closed 2 years ago
This will require a different pipeline compared to the standard base_hyperopt
formulation. We will have to use a pretty rigid training script with implements the step
, evaluate
functions. Below you find a little mental dump of how this could work with the following setup:
Once we have addressed the async scheduling of jobs #8 I would love to implement population-based training for hyperparameter optimization (Jaderberg et al., 2017 - https://arxiv.org/abs/1711.09846). It appears to be the most efficient parallel + non-sequential tuning algorithm even for small population size (ca. 15 runs) and across multiple domains.
The general API looks as follows:
Step
-Eval
-Ready
? - If yes:Exploit
? - If params changed:Explore
. The steps are performed asynchronous and in parallel. More details on each step:Step
: Optimisation of network given fixed current hyperparamsEval
: Compute fitness/performance after optimization stepReady
: Population member undergoesexplore
/exploit
only when a fixed number ofStep
updates has been done since the last time that member was ready. E.g. this could be 10k SGD updates.Exploit
: Different exploitation strategiesExplore
: Different exploration strategiesImportant detail: PBT is not only a hyperparameter optimizer but also a model selection mechanism since we copy also weight parameters over!
The different mutations/steps/exploitation ranking themself don't appear to be hard to implement. But we do need a smooth logging setup as well as a standardized way of reloading network checkpoints. Probably have to differentiate between torch, tf, jax network checkpoint reloading.