Model-based Asynchronous Hyperparameter and Neural Architecture Search

ABOHB paper

Main points

Use GP (Gaussian process) instead of TPE, because GP can model the interaction effect across budgets better
For pending configurations, we use fantasizing (see below) and the authors rely on this part to improve the performance on asynchronous settings.

Fantasizing

Fantasizing is to simply marginalize the unobserved objective values. Let a set of observations be:

\\\{(x_n, y_n)\\\}_{n=1}^N

and a set of pending observations be:

\\\{(x_m, y_m)\\\}_{m=1}^M.

Furthermore, let a utility function for an acquisition function be $u(x|\cdot)$. Then the fantasizing is to compute the following marginalized acquisition function:

\int u(x|\\{(x_n, y_n)\\}_{n=1}^N, \\{(x_m, y_m)\\}_{m=1}^{M} )
p(\\{y_m\\}_{m=1}^M | \\{x_m\\}_{m=1}^M, \\{(x_n, y_n)\\}_{n=1}^N)dy_1 dy_2 \dots dy_M

nabenabe0928 / reading-list

Model-based Asynchronous Hyperparameter and Neural Architecture Search #63

Main points

Fantasizing

Experiments

Performance over time with 4, 8 workers

Scalability test