Deep Ranking Ensembles for Hyperparameter Optimization

Main points

Introducing learning-to-rank (L2R) to the Bayesian optimization framework
L2R framework is an existing work:
1. Train a meta-feature network that takes a pair of a configuration x and its performance y and outputs a latent representation z
2. Train regression models that take a configuration x and the latent representation vector z
3. Ensemble those regression models to compute the mean and the variance of rank
4. The experiments demonstrated that the proposed method works nicely

Introducing rank regression in Bayesian optimization
1. Although rank-based optimization methods are heavily used in practice such as TPE and CMA-ES, BO framework has not considered rank regression yet
The experiments compare the proposed method with many other BO methods
It works better than other methods when we have a diverse set of meta-datasets with a decent amount of observations for each.
the paper provides the best choices for each component such as the ranking loss, whether to have meta-features, the acquisition function

The details of the search space (I looked into the HPO-B paper, but there are too many search spaces and that is why I could not know which one the authors used; especially it matters because we do not know how difficult each task was)
Based on Figure 5, the proposed method seems to have better initial designs than other methods, but I do not know why it happens.
The proposed method requires training at every iteration and the paper does not mention how long it takes
Average rank itself does not tell us how better the results were (in other words, we need the information about how much improvement we had in the performance metric. This information is not provided even in Appendix)
Based on my experience, most GP- or NN-based BO families do not work better than TPE and CMA-ES in most cases and that is why I am wondering if this method is really strong in practice for non-transfer settings.