Supervising the Multi-Fidelity Race of Hyperparameter Configurations

DyHPO paper

Main points

Train GP surrogate with convoluted features of CNN using learning curves (, which was not the case for Kandasamy paper)
The acquisition function is the budget-wise expected improvement. It means that if a specific configuration could be expected to improve from the current best with a certain budget, this configuration will be chosen
The training at each iteration is one unit budget equivalent. (and thus we need to store all the model states somewhere)

Confusing points

1.Algorithm 1

Line 3, I guess the optimization is performed budget-wisely. (so evaluate the acquisition function for each observed configuration less than or equal to the observed budget + 1.) ==> Probably, we need to optimize only the one with budget 0 and otherwise, we can just sample whatever we have.

Experiments

Baselines

Hyperband (HB)
BOHB
DEHB
ASHA
MF-DNN
Dragonfly

Note that MF-DNN refers to "Multi-Fidelity Bayesian Optimization via Deep Neural Networks".

Benchmarks

LCBench
TaskSet
NB201

Performance over time

x-axis: number of Epochs y-axis: loss

10 repetitions. Training of deep kernel takes at least 10 seconds at each iteration. They also recommended that people should use random search if this is gonna be a significant bottleneck.

Statistical test (Wilcoxon singed-rank test)

The fraction of top-performing candidates

The precision at the $i$-th each is defined as the number of top $1%$ configs that are trained at least till the $i$-th epoch divided by the number of all configs that are trained at least till the $i$-th epoch.

Average regret of the training of a specific budget

Honestly, I have not fully understood it.

The percentage of flipped configs

The percentage of the good configurations that were judged to be bad in previous iterations. More specifically, the percentage of configurations that belong to the top $33%$ at a given budget, but were the bottom $67%$ at a previous budget.

It is also hard to understand how they defined "a previous budget". Is it any previous budgets or only the one previous budget? Anyways, this part was not clear.

Ablation study of with or without learning curves feature via CNN

Apparently, the learning curve features are important.

nabenabe0928 / reading-list