Multi-fidelity zero-shot HPO

Main points

How to warm-start

The naive approach is to pick $K$ configurations from $N$ configurations such that the average loss will be minimized:

$\min{S \in [N]} \sum{n \in S} \sum_{d \in [D]} L(x_n | \mathcal{D}_d )$ s.t. $|S|=K$

Although the paper says this is approximated by the greedy algorithm, we can achieve the optimality with $O(DN^2)$ using dynamic programming (dp[d][i][j] := the minimum loss sum up to $d$ datasets, picking $i$ configurations from up to $j$ configurations). The dynamic programming itself could be a new algorithm, but it is just gonna be a workshop paper.

The proposition is to reduce the algorithm above by combining the HyperBand algorithm, but I have not got how.

Experiments used self-made setups.

nabenabe0928 / reading-list

Multi-fidelity zero-shot HPO #77

Main points