Train GP surrogate with convoluted features of CNN using learning curves (, which was not the case for Kandasamy paper)
The acquisition function is the budget-wise expected improvement. It means that if a specific configuration could be expected to improve from the current best with a certain budget, this configuration will be chosen
The training at each iteration is one unit budget equivalent. (and thus we need to store all the model states somewhere)
Confusing points
1.Algorithm 1
Line 3, I guess the optimization is performed budget-wisely. (so evaluate the acquisition function for each observed configuration less than or equal to the observed budget + 1.)
==> Probably, we need to optimize only the one with budget 0 and otherwise, we can just sample whatever we have.
Note that MF-DNN refers to "Multi-Fidelity Bayesian Optimization via Deep Neural Networks".
Benchmarks
LCBench
TaskSet
NB201
Performance over time
x-axis: number of Epochs
y-axis: loss
10 repetitions. Training of deep kernel takes at least 10 seconds at each iteration.
They also recommended that people should use random search if this is gonna be a significant bottleneck.
Statistical test (Wilcoxon singed-rank test)
The fraction of top-performing candidates
The precision at the $i$-th each is defined as the number of top $1%$ configs that are trained at least till the $i$-th epoch divided by the number of all configs that are trained at least till the $i$-th epoch.
Average regret of the training of a specific budget
Honestly, I have not fully understood it.
The percentage of flipped configs
The percentage of the good configurations that were judged to be bad in previous iterations.
More specifically, the percentage of configurations that belong to the top $33%$ at a given budget, but were the bottom $67%$ at a previous budget.
It is also hard to understand how they defined "a previous budget". Is it any previous budgets or only the one previous budget?
Anyways, this part was not clear.
Ablation study of with or without learning curves feature via CNN
Apparently, the learning curve features are important.
Supervising the Multi-Fidelity Race of Hyperparameter Configurations
DyHPO paper
Main points
Confusing points
1.Algorithm 1
Experiments
Baselines
Note that MF-DNN refers to "Multi-Fidelity Bayesian Optimization via Deep Neural Networks".
Benchmarks
Performance over time
x-axis: number of Epochs y-axis: loss
10 repetitions. Training of deep kernel takes at least 10 seconds at each iteration. They also recommended that people should use random search if this is gonna be a significant bottleneck.
Statistical test (Wilcoxon singed-rank test)
The fraction of top-performing candidates
The precision at the $i$-th each is defined as the number of top $1%$ configs that are trained at least till the $i$-th epoch divided by the number of all configs that are trained at least till the $i$-th epoch.
Average regret of the training of a specific budget
Honestly, I have not fully understood it.
The percentage of flipped configs
The percentage of the good configurations that were judged to be bad in previous iterations. More specifically, the percentage of configurations that belong to the top $33%$ at a given budget, but were the bottom $67%$ at a previous budget.
It is also hard to understand how they defined "a previous budget". Is it any previous budgets or only the one previous budget? Anyways, this part was not clear.
Ablation study of with or without learning curves feature via CNN
Apparently, the learning curve features are important.