Make `bayesmark` benchmark consistent with `kurobako` benchmark

HideakiImamura commented 2 years ago

Motivation

Part of #3245.

Recently, the algorithm benchmark environment with bayesmark has been introduced in #3354. We can easily evaluate the performance of the black-box optimization algorithm on the CI (GitHub Actions) for the hyperparameter optimization task of machine learning models with sklearn. The benchmark result is generated by the features of bayesmark, so it is slightly different from that of the benchmark environment with kurobako.

To compare with the result fairly, it would be great to make the benchmark result of bayesmark consistent with that of kurobako. We want a help to do this.

Description

About `bayesmark` benchmark

The benchmark result with bayesmark consists of plots (.png files) and a report (.md file).

The plot looks like the following. optuna-wine-RF-sumamry

The report looks like this.

``` # Benchmark Result Report * Number of Solvers: 4 * Number of Models: 7 * Number of Datasets: 5 * Number of Problems: 35 Final score for each problem is calculated as `100 x (1-loss)`. Solver with lowest score in each problem wins. For more details visit [bayesmark docs.](https://bayesmark.readthedocs.io/en/stable/scoring.html) ## Table of Contents 1. [Problem Leaderboards](#problem-leaderboards) 2. [Datasets](#datasets) 3. [Models](#models) ## Problem Leaderboards ### (1) Problem: Wine-kNN |Ranking|Solver|Score| |---:|:---|---:| |1|RandomSearch_0.0.8_a376313|98.43800| |2|CmaEsSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|99.33737| |3|RandomSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|99.72773| |4|TPESampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|100.00000| ### (2) Problem: Digits-RF |Ranking|Solver|Score| |---:|:---|---:| |1|CmaEsSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|29.71156| |2|RandomSearch_0.0.8_a376313|52.48060| |3|RandomSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|57.26096| |4|TPESampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|78.78381| ### (3) Problem: Iris-linear |Ranking|Solver|Score| |---:|:---|---:| |1|RandomSearch_0.0.8_a376313|99.22988| |2|RandomSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|99.29395| |3|CmaEsSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|99.88462| |4|TPESampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|99.95796| ### (4) Problem: Digits-kNN |Ranking|Solver|Score| |---:|:---|---:| |1|CmaEsSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|91.99988| |2|RandomSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|92.78616| |3|RandomSearch_0.0.8_a376313|94.78428| |4|TPESampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|97.59997| ### (5) Problem: Breast-RF |Ranking|Solver|Score| |---:|:---|---:| |1|CmaEsSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|90.82190| |2|RandomSearch_0.0.8_a376313|91.30827| |3|RandomSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|92.23865| |4|TPESampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|98.72798| ### (6) Problem: Breast-ada |Ranking|Solver|Score| |---:|:---|---:| |1|CmaEsSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|93.07692| |2|RandomSearch_0.0.8_a376313|93.52236| |3|RandomSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|94.13608| |4|TPESampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|95.27214| ### (7) Problem: Digits-ada |Ranking|Solver|Score| |---:|:---|---:| |1|RandomSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|82.28052| |2|RandomSearch_0.0.8_a376313|82.71085| |3|CmaEsSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|85.25144| |4|TPESampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|94.47852| ### (8) Problem: Breast-linear |Ranking|Solver|Score| |---:|:---|---:| |1|RandomSearch_0.0.8_a376313|88.32422| |2|RandomSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|89.82054| |3|TPESampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|95.99789| |4|CmaEsSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|96.68864| ### (9) Problem: Diabetes-ada |Ranking|Solver|Score| |---:|:---|---:| |1|TPESampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|54.67420| |2|RandomSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|55.60371| |3|CmaEsSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|57.47760| |4|RandomSearch_0.0.8_a376313|60.73149| ### (10) Problem: Digits-MLP |Ranking|Solver|Score| |---:|:---|---:| |1|RandomSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|98.07971| |2|RandomSearch_0.0.8_a376313|98.35850| |3|CmaEsSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|98.66676| |4|TPESampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|99.47083| ### (11) Problem: Diabetes-linear |Ranking|Solver|Score| |---:|:---|---:| |1|TPESampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|99.99903| |2|RandomSearch_0.0.8_a376313|99.99908| |3|CmaEsSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|99.99921| |4|RandomSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|99.99943| ### (12) Problem: Wine-ada |Ranking|Solver|Score| |---:|:---|---:| |1|RandomSearch_0.0.8_a376313|88.21075| |2|RandomSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|91.33582| |3|TPESampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|93.69561| |4|CmaEsSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|95.11707| ### (13) Problem: Breast-DT |Ranking|Solver|Score| |---:|:---|---:| |1|RandomSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|74.35515| |2|RandomSearch_0.0.8_a376313|77.04468| |3|CmaEsSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|79.10898| |4|TPESampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|85.45124| ### (14) Problem: Iris-SVM |Ranking|Solver|Score| |---:|:---|---:| |1|RandomSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|91.80872| |2|RandomSearch_0.0.8_a376313|93.20665| |3|CmaEsSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|95.93595| |4|TPESampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|97.93276| ### (15) Problem: Diabetes-DT |Ranking|Solver|Score| |---:|:---|---:| |1|CmaEsSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|62.85556| |2|RandomSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|82.29193| |3|RandomSearch_0.0.8_a376313|88.06566| |4|TPESampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|91.32352| ### (16) Problem: Diabetes-kNN |Ranking|Solver|Score| |---:|:---|---:| |1|CmaEsSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|81.32551| |2|RandomSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|87.81427| |3|RandomSearch_0.0.8_a376313|91.31568| |4|TPESampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|100.00000| ### (17) Problem: Iris-RF |Ranking|Solver|Score| |---:|:---|---:| |1|CmaEsSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|76.89165| |2|RandomSearch_0.0.8_a376313|96.33917| |3|RandomSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|96.85348| |4|TPESampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|98.40308| ### (18) Problem: Diabetes-SVM |Ranking|Solver|Score| |---:|:---|---:| |1|RandomSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|61.63582| |2|RandomSearch_0.0.8_a376313|72.62547| |3|CmaEsSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|75.18580| |4|TPESampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|97.94048| ### (19) Problem: Wine-DT |Ranking|Solver|Score| |---:|:---|---:| |1|RandomSearch_0.0.8_a376313|83.43596| |2|RandomSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|88.41002| |3|CmaEsSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|91.98483| |4|TPESampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|94.10522| ### (20) Problem: Iris-DT |Ranking|Solver|Score| |---:|:---|---:| |1|RandomSearch_0.0.8_a376313|87.43202| |2|RandomSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|88.78875| |3|TPESampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|89.80417| |4|CmaEsSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|92.15297| ### (21) Problem: Iris-kNN |Ranking|Solver|Score| |---:|:---|---:| |1|CmaEsSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|64.35746| |2|RandomSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|99.47176| |3|RandomSearch_0.0.8_a376313|99.47176| |4|TPESampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|100.00000| ### (22) Problem: Digits-linear |Ranking|Solver|Score| |---:|:---|---:| |1|RandomSearch_0.0.8_a376313|97.48373| |2|RandomSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|98.01827| |3|CmaEsSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|98.19892| |4|TPESampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|99.87956| ### (23) Problem: Iris-MLP |Ranking|Solver|Score| |---:|:---|---:| |1|CmaEsSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|60.39782| |2|RandomSearch_0.0.8_a376313|70.02140| |3|RandomSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|76.04646| |4|TPESampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|96.05347| ### (24) Problem: Wine-linear |Ranking|Solver|Score| |---:|:---|---:| |1|RandomSearch_0.0.8_a376313|88.69473| |2|RandomSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|93.74903| |3|TPESampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|97.43578| |4|CmaEsSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|97.83390| ### (25) Problem: Digits-DT |Ranking|Solver|Score| |---:|:---|---:| |1|CmaEsSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|41.29292| |2|RandomSearch_0.0.8_a376313|60.94651| |3|RandomSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|65.78853| |4|TPESampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|92.19745| ### (26) Problem: Breast-SVM |Ranking|Solver|Score| |---:|:---|---:| |1|RandomSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|95.43879| |2|RandomSearch_0.0.8_a376313|95.45857| |3|CmaEsSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|97.59711| |4|TPESampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|97.81656| ### (27) Problem: Wine-RF |Ranking|Solver|Score| |---:|:---|---:| |1|CmaEsSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|80.97452| |2|RandomSearch_0.0.8_a376313|91.63723| |3|RandomSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|92.50064| |4|TPESampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|98.60654| ### (28) Problem: Digits-SVM |Ranking|Solver|Score| |---:|:---|---:| |1|RandomSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|94.80118| |2|CmaEsSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|95.08001| |3|RandomSearch_0.0.8_a376313|95.08653| |4|TPESampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|95.94751| ### (29) Problem: Wine-SVM |Ranking|Solver|Score| |---:|:---|---:| |1|RandomSearch_0.0.8_a376313|77.18651| |2|RandomSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|81.38664| |3|CmaEsSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|86.14977| |4|TPESampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|89.55377| ### (30) Problem: Breast-kNN |Ranking|Solver|Score| |---:|:---|---:| |1|CmaEsSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|67.72550| |2|RandomSearch_0.0.8_a376313|98.64984| |3|RandomSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|99.64788| |4|TPESampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|100.00000| ### (31) Problem: Iris-ada |Ranking|Solver|Score| |---:|:---|---:| |1|RandomSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|55.30547| |2|RandomSearch_0.0.8_a376313|58.15975| |3|CmaEsSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|59.99758| |4|TPESampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|61.48834| ### (32) Problem: Breast-MLP |Ranking|Solver|Score| |---:|:---|---:| |1|RandomSearch_0.0.8_a376313|97.48776| |2|RandomSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|97.50140| |3|CmaEsSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|98.14000| |4|TPESampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|98.65327| ### (33) Problem: Diabetes-RF |Ranking|Solver|Score| |---:|:---|---:| |1|CmaEsSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|83.84182| |2|RandomSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|94.89634| |3|RandomSearch_0.0.8_a376313|95.20453| |4|TPESampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|98.12997| ### (34) Problem: Wine-MLP |Ranking|Solver|Score| |---:|:---|---:| |1|CmaEsSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|97.53906| |2|RandomSearch_0.0.8_a376313|97.69293| |3|RandomSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|97.74196| |4|TPESampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|97.94986| ### (35) Problem: Diabetes-MLP |Ranking|Solver|Score| |---:|:---|---:| |1|RandomSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|92.96621| |2|RandomSearch_0.0.8_a376313|93.05417| |3|CmaEsSampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|96.71501| |4|TPESampler-NopPruner-Optuna_3.0.0b0.dev0_a376313|98.79988| ## Datasets * [Breast Cancer Wisconsin](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_breast_cancer.html#sklearn.datasets.load_breast_cancer) * [Diabetes Data Set](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html#sklearn.datasets.load_diabetes) * [Digits](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html#sklearn.datasets.load_digits) * [Iris](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html#sklearn.datasets.load_iris) * [Wine](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_wine.html#sklearn.datasets.load_wine) ## Models * [AdaBoost](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.AdaBoostClassifier.html) * [Decistion Tree](https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html) * [kNN](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html) * [Linear Model](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) * [Multi-layer Perceptron](https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html) * [Random Forest](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html) * [SVM](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html) ```

About `kurobako` benchmark

The benchmark result with kurobako consists of plots (.png files) and a report (.md file).

The plot looks like the following. hpo-bench-parkinson-1342958b65feb590e4c0284f73a7373aa1006595f85f3ce9a4645ea1bfa581d6

The report looks like this.

```` # Benchmark Result Report - Kurobako Version: [0.2.9](https://github.com/sile/kurobako/tree/0.2.9) - Number of Solvers: 3 - Number of Problems: 5 - Metrics Precedence: `best value -> AUC` Please refer to ["A Strategy for Ranking Optimizers using Multiple Criteria"][Dewancker, Ian, et al., 2016] for the ranking strategy used in this report. [Dewancker, Ian, et al., 2016]: http://proceedings.mlr.press/v64/dewancker_strategy_2016.pdf ## Table of Contents 1. [Overall Results](#overall-results) 2. [Individual Results](#individual-results) 3. [Solvers](#solvers) 4. [Problems](#problems) 5. [Studies](#studies) ## Overall Results | Solver | Borda | Firsts | |:-------------------------------------------------------------------------------------------------|------:|-------:| | [_QMCSampler_NopPruner](#id-1fe0844ada564e666be66d917607bb09a043f34674ddd98a12633fea20cdc225) | 3 | 0 | | [_RandomSampler_NopPruner](#id-93769d5e7bdf1663db0c81496d24996f744d05ff39033370a1df0b9de6da6c7e) | 0 | 0 | | [_TPESampler_NopPruner](#id-1484b7953e8592b67007e0dbb7e0229c93d6746e3f7c11c87f90536dfc8dcd7e) | 10 | 5 | ## Individual Results ### (1) Problem: [HPO-Bench-Parkinson](#id-1342958b65feb590e4c0284f73a7373aa1006595f85f3ce9a4645ea1bfa581d6) | Ranking | Solver | Best (avg +- sd) | AUC (avg +- sd) | Elapsed (avg +- sd) | |--------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------:|----------------:|--------------------:| | 1 | [_TPESampler_NopPruner](#id-1484b7953e8592b67007e0dbb7e0229c93d6746e3f7c11c87f90536dfc8dcd7e) ([study](#id-9db303552ebbfe90542e9ad8706e0e984c92507d9e957986490f039761b691af)) | 0.008331 +- 0.002071 | 2.068 +- 1.519 | 7.482 +- 0.558 | | 2 | [_QMCSampler_NopPruner](#id-1fe0844ada564e666be66d917607bb09a043f34674ddd98a12633fea20cdc225) ([study](#id-54bc7edab245171e78983ee51e35b6dcbf22e48f538402e89e4f3b93b9232505)) | 0.011527 +- 0.003431 | 2.028 +- 1.332 | 1.019 +- 0.127 | | 3 | [_RandomSampler_NopPruner](#id-93769d5e7bdf1663db0c81496d24996f744d05ff39033370a1df0b9de6da6c7e) ([study](#id-a7d6314033494dbdd4ab0bc15c56b0a05b3fa85f5819d75e680d1daaf962d9ec)) | 0.014901 +- 0.004423 | 2.554 +- 1.600 | 0.764 +- 0.088 | ### (2) Problem: [HPO-Bench-Protein](#id-1b4174b8176983b02dd2f15d046224b7711a434a5cdcbaa21ee312a518d9e16f) | Ranking | Solver | Best (avg +- sd) | AUC (avg +- sd) | Elapsed (avg +- sd) | |--------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------:|----------------:|--------------------:| | 1 | [_TPESampler_NopPruner](#id-1484b7953e8592b67007e0dbb7e0229c93d6746e3f7c11c87f90536dfc8dcd7e) ([study](#id-f669b6f11ef01141191375ba9774d1e58b215f7da9821ce4496b0f0caa2b34e1)) | 0.222274 +- 0.002062 | 19.638 +- 0.805 | 7.472 +- 0.419 | | 2 | [_QMCSampler_NopPruner](#id-1fe0844ada564e666be66d917607bb09a043f34674ddd98a12633fea20cdc225) ([study](#id-fd6c89fc25537cea16ff8a37c883edb97355e5ab63e32619322634abdefc8493)) | 0.259099 +- 0.013783 | 21.546 +- 1.301 | 1.038 +- 0.193 | | 2 | [_RandomSampler_NopPruner](#id-93769d5e7bdf1663db0c81496d24996f744d05ff39033370a1df0b9de6da6c7e) ([study](#id-7fd6f71364d7312b978e4a88224321fd05784c2af2b9fbfbb91ad51960fbab20)) | 0.259152 +- 0.014803 | 21.966 +- 1.178 | 0.788 +- 0.220 | ### (3) Problem: [HPO-Bench-Slice](#id-5f2477ceae0c4ef5f4c2559346629b167d8768929dd754173010cdbcfe505631) | Ranking | Solver | Best (avg +- sd) | AUC (avg +- sd) | Elapsed (avg +- sd) | |--------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------:|----------------:|--------------------:| | 1 | [_TPESampler_NopPruner](#id-1484b7953e8592b67007e0dbb7e0229c93d6746e3f7c11c87f90536dfc8dcd7e) ([study](#id-b008ed2fb8b228231c26093b49bd5223c6ed3ab6b9c69c987902609227c9be3e)) | 0.000205 +- 0.000034 | 0.561 +- 2.676 | 7.491 +- 0.481 | | 2 | [_QMCSampler_NopPruner](#id-1fe0844ada564e666be66d917607bb09a043f34674ddd98a12633fea20cdc225) ([study](#id-1e5140a20a581453670c630ba337d29d3f65ad1cc94eebe88fa34ee65728423e)) | 0.000405 +- 0.000131 | 0.349 +- 1.353 | 0.993 +- 0.129 | | 3 | [_RandomSampler_NopPruner](#id-93769d5e7bdf1663db0c81496d24996f744d05ff39033370a1df0b9de6da6c7e) ([study](#id-ab39c40a069d4f2cef208748e829beacdf2aaa9cf92e37fba70b96d312b08ddb)) | 0.000490 +- 0.000194 | 0.588 +- 2.675 | 0.783 +- 0.136 | ### (4) Problem: [NASBench (A)](#id-779379c567ac85eea01acb44fcdea10fd36ba7ae3e5db21ab89b4775794eb122) | Ranking | Solver | Best (avg +- sd) | AUC (avg +- sd) | Elapsed (avg +- sd) | |--------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------:|----------------:|--------------------:| | 1 | [_TPESampler_NopPruner](#id-1484b7953e8592b67007e0dbb7e0229c93d6746e3f7c11c87f90536dfc8dcd7e) ([study](#id-9e95408cdcfdcd06fa7f866dd9a7c3a6fd94b5663e774230550839fc640c4b31)) | 0.054141 +- 0.002120 | 4.587 +- 0.183 | 3.612 +- 0.419 | | 2 | [_QMCSampler_NopPruner](#id-1fe0844ada564e666be66d917607bb09a043f34674ddd98a12633fea20cdc225) ([study](#id-d2d3da12a8be9b25b863ea4d57d5876f04f9d096e2e31d9c45f45e1b86a0bae8)) | 0.056412 +- 0.002046 | 4.711 +- 0.178 | 3.509 +- 0.437 | | 2 | [_RandomSampler_NopPruner](#id-93769d5e7bdf1663db0c81496d24996f744d05ff39033370a1df0b9de6da6c7e) ([study](#id-05e91d199311764b1f48e87f336d3dd997afefa11ff67ec82c714af7a2097938)) | 0.056412 +- 0.002046 | 4.711 +- 0.178 | 1.549 +- 0.369 | ### (5) Problem: [HPO-Bench-Naval](#id-8a1cfca70207d8f788756eb020787bdae5eda54db2b55f4e99e06fd3b2c37501) | Ranking | Solver | Best (avg +- sd) | AUC (avg +- sd) | Elapsed (avg +- sd) | |--------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------:|----------------:|--------------------:| | 1 | [_TPESampler_NopPruner](#id-1484b7953e8592b67007e0dbb7e0229c93d6746e3f7c11c87f90536dfc8dcd7e) ([study](#id-014bf87a926ed060cbfc6409d5196dbd79331332e65ad47e4d923a8c3be766a6)) | 0.000034 +- 0.000008 | 0.829 +- 1.414 | 7.486 +- 0.429 | | 2 | [_QMCSampler_NopPruner](#id-1fe0844ada564e666be66d917607bb09a043f34674ddd98a12633fea20cdc225) ([study](#id-44bccb690d68ddea042e2f3e46eb5c09b1b1c019b097ac214bf4e4707db437ae)) | 0.000100 +- 0.000077 | 0.465 +- 0.940 | 1.015 +- 0.139 | | 3 | [_RandomSampler_NopPruner](#id-93769d5e7bdf1663db0c81496d24996f744d05ff39033370a1df0b9de6da6c7e) ([study](#id-60a59ed57f7d10de797ae14783d11c6c752e0c7e8229f3c02fa1736a05a368d9)) | 0.000188 +- 0.000191 | 0.856 +- 1.412 | 0.739 +- 0.100 | ## Solvers ### ID: 1fe0844ada564e666be66d917607bb09a043f34674ddd98a12633fea20cdc225 recipe: ```json { "name": "_QMCSampler_NopPruner", "optuna": { "loglevel": "debug", "sampler": "QMCSampler", "sampler_kwargs": "{}", "pruner": "NopPruner", "pruner_kwargs": "{}" } } ``` specification: ```json { "name": "_QMCSampler_NopPruner", "attrs": { "github": "https://github.com/optuna/optuna", "paper": "Akiba, Takuya, et al. \"Optuna: A next-generation hyperparameter optimization framework.\" Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 2019.", "version": "optuna=3.0.0a1.dev0, kurobako-py=0.1.12" }, "capabilities": [ "UNIFORM_CONTINUOUS", "UNIFORM_DISCRETE", "LOG_UNIFORM_CONTINUOUS", "LOG_UNIFORM_DISCRETE", "CATEGORICAL", "CONDITIONAL", "MULTI_OBJECTIVE", "CONCURRENT" ] } ``` ### ID: 93769d5e7bdf1663db0c81496d24996f744d05ff39033370a1df0b9de6da6c7e recipe: ```json { "name": "_RandomSampler_NopPruner", "optuna": { "loglevel": "debug", "sampler": "RandomSampler", "sampler_kwargs": "{}", "pruner": "NopPruner", "pruner_kwargs": "{}" } } ``` specification: ```json { "name": "_RandomSampler_NopPruner", "attrs": { "github": "https://github.com/optuna/optuna", "paper": "Akiba, Takuya, et al. \"Optuna: A next-generation hyperparameter optimization framework.\" Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 2019.", "version": "optuna=3.0.0a1.dev0, kurobako-py=0.1.12" }, "capabilities": [ "UNIFORM_CONTINUOUS", "UNIFORM_DISCRETE", "LOG_UNIFORM_CONTINUOUS", "LOG_UNIFORM_DISCRETE", "CATEGORICAL", "CONDITIONAL", "MULTI_OBJECTIVE", "CONCURRENT" ] } ``` ### ID: 1484b7953e8592b67007e0dbb7e0229c93d6746e3f7c11c87f90536dfc8dcd7e recipe: ```json { "name": "_TPESampler_NopPruner", "optuna": { "loglevel": "debug", "sampler": "TPESampler", "sampler_kwargs": "{\"multivariate\":true,\"constant_liar\":true}", "pruner": "NopPruner", "pruner_kwargs": "{}" } } ``` specification: ```json { "name": "_TPESampler_NopPruner", "attrs": { "github": "https://github.com/optuna/optuna", "paper": "Akiba, Takuya, et al. \"Optuna: A next-generation hyperparameter optimization framework.\" Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 2019.", "version": "optuna=3.0.0a1.dev0, kurobako-py=0.1.12" }, "capabilities": [ "UNIFORM_CONTINUOUS", "UNIFORM_DISCRETE", "LOG_UNIFORM_CONTINUOUS", "LOG_UNIFORM_DISCRETE", "CATEGORICAL", "CONDITIONAL", "MULTI_OBJECTIVE", "CONCURRENT" ] } ``` ## Problems ### ID: 8a1cfca70207d8f788756eb020787bdae5eda54db2b55f4e99e06fd3b2c37501 recipe: ```json { "hpobench": { "dataset": "./fcnet_tabular_benchmarks/fcnet_naval_propulsion_data.hdf5" } } ``` specification: ```json { "name": "HPO-Bench-Naval", "attrs": { "github": "https://github.com/automl/nas_benchmarks", "paper": "Klein, Aaron, and Frank Hutter. \"Tabular Benchmarks for Joint Architecture and Hyperparameter Optimization.\" arXiv preprint arXiv:1905.04970 (2019).", "version": "kurobako_problems=0.1.13" }, "params_domain": [ { "name": "activation_fn_1", "range": { "type": "CATEGORICAL", "choices": [ "tanh", "relu" ] }, "distribution": "UNIFORM" }, { "name": "activation_fn_2", "range": { "type": "CATEGORICAL", "choices": [ "tanh", "relu" ] }, "distribution": "UNIFORM" }, { "name": "batch_size", "range": { "type": "DISCRETE", "low": 0, "high": 4 }, "distribution": "UNIFORM" }, { "name": "dropout_1", "range": { "type": "DISCRETE", "low": 0, "high": 3 }, "distribution": "UNIFORM" }, { "name": "dropout_2", "range": { "type": "DISCRETE", "low": 0, "high": 3 }, "distribution": "UNIFORM" }, { "name": "init_lr", "range": { "type": "DISCRETE", "low": 0, "high": 6 }, "distribution": "UNIFORM" }, { "name": "lr_schedule", "range": { "type": "CATEGORICAL", "choices": [ "cosine", "const" ] }, "distribution": "UNIFORM" }, { "name": "n_units_1", "range": { "type": "DISCRETE", "low": 0, "high": 6 }, "distribution": "UNIFORM" }, { "name": "n_units_2", "range": { "type": "DISCRETE", "low": 0, "high": 6 }, "distribution": "UNIFORM" } ], "values_domain": [ { "name": "Validation MSE", "range": { "type": "CONTINUOUS", "low": 0.0 }, "distribution": "UNIFORM" } ], "steps": 100 } ``` ### ID: 1342958b65feb590e4c0284f73a7373aa1006595f85f3ce9a4645ea1bfa581d6 recipe: ```json { "hpobench": { "dataset": "./fcnet_tabular_benchmarks/fcnet_parkinsons_telemonitoring_data.hdf5" } } ``` specification: ```json { "name": "HPO-Bench-Parkinson", "attrs": { "github": "https://github.com/automl/nas_benchmarks", "paper": "Klein, Aaron, and Frank Hutter. \"Tabular Benchmarks for Joint Architecture and Hyperparameter Optimization.\" arXiv preprint arXiv:1905.04970 (2019).", "version": "kurobako_problems=0.1.13" }, "params_domain": [ { "name": "activation_fn_1", "range": { "type": "CATEGORICAL", "choices": [ "tanh", "relu" ] }, "distribution": "UNIFORM" }, { "name": "activation_fn_2", "range": { "type": "CATEGORICAL", "choices": [ "tanh", "relu" ] }, "distribution": "UNIFORM" }, { "name": "batch_size", "range": { "type": "DISCRETE", "low": 0, "high": 4 }, "distribution": "UNIFORM" }, { "name": "dropout_1", "range": { "type": "DISCRETE", "low": 0, "high": 3 }, "distribution": "UNIFORM" }, { "name": "dropout_2", "range": { "type": "DISCRETE", "low": 0, "high": 3 }, "distribution": "UNIFORM" }, { "name": "init_lr", "range": { "type": "DISCRETE", "low": 0, "high": 6 }, "distribution": "UNIFORM" }, { "name": "lr_schedule", "range": { "type": "CATEGORICAL", "choices": [ "cosine", "const" ] }, "distribution": "UNIFORM" }, { "name": "n_units_1", "range": { "type": "DISCRETE", "low": 0, "high": 6 }, "distribution": "UNIFORM" }, { "name": "n_units_2", "range": { "type": "DISCRETE", "low": 0, "high": 6 }, "distribution": "UNIFORM" } ], "values_domain": [ { "name": "Validation MSE", "range": { "type": "CONTINUOUS", "low": 0.0 }, "distribution": "UNIFORM" } ], "steps": 100 } ``` ### ID: 1b4174b8176983b02dd2f15d046224b7711a434a5cdcbaa21ee312a518d9e16f recipe: ```json { "hpobench": { "dataset": "./fcnet_tabular_benchmarks/fcnet_protein_structure_data.hdf5" } } ``` specification: ```json { "name": "HPO-Bench-Protein", "attrs": { "github": "https://github.com/automl/nas_benchmarks", "paper": "Klein, Aaron, and Frank Hutter. \"Tabular Benchmarks for Joint Architecture and Hyperparameter Optimization.\" arXiv preprint arXiv:1905.04970 (2019).", "version": "kurobako_problems=0.1.13" }, "params_domain": [ { "name": "activation_fn_1", "range": { "type": "CATEGORICAL", "choices": [ "tanh", "relu" ] }, "distribution": "UNIFORM" }, { "name": "activation_fn_2", "range": { "type": "CATEGORICAL", "choices": [ "tanh", "relu" ] }, "distribution": "UNIFORM" }, { "name": "batch_size", "range": { "type": "DISCRETE", "low": 0, "high": 4 }, "distribution": "UNIFORM" }, { "name": "dropout_1", "range": { "type": "DISCRETE", "low": 0, "high": 3 }, "distribution": "UNIFORM" }, { "name": "dropout_2", "range": { "type": "DISCRETE", "low": 0, "high": 3 }, "distribution": "UNIFORM" }, { "name": "init_lr", "range": { "type": "DISCRETE", "low": 0, "high": 6 }, "distribution": "UNIFORM" }, { "name": "lr_schedule", "range": { "type": "CATEGORICAL", "choices": [ "cosine", "const" ] }, "distribution": "UNIFORM" }, { "name": "n_units_1", "range": { "type": "DISCRETE", "low": 0, "high": 6 }, "distribution": "UNIFORM" }, { "name": "n_units_2", "range": { "type": "DISCRETE", "low": 0, "high": 6 }, "distribution": "UNIFORM" } ], "values_domain": [ { "name": "Validation MSE", "range": { "type": "CONTINUOUS", "low": 0.0 }, "distribution": "UNIFORM" } ], "steps": 100 } ``` ### ID: 5f2477ceae0c4ef5f4c2559346629b167d8768929dd754173010cdbcfe505631 recipe: ```json { "hpobench": { "dataset": "./fcnet_tabular_benchmarks/fcnet_slice_localization_data.hdf5" } } ``` specification: ```json { "name": "HPO-Bench-Slice", "attrs": { "github": "https://github.com/automl/nas_benchmarks", "paper": "Klein, Aaron, and Frank Hutter. \"Tabular Benchmarks for Joint Architecture and Hyperparameter Optimization.\" arXiv preprint arXiv:1905.04970 (2019).", "version": "kurobako_problems=0.1.13" }, "params_domain": [ { "name": "activation_fn_1", "range": { "type": "CATEGORICAL", "choices": [ "tanh", "relu" ] }, "distribution": "UNIFORM" }, { "name": "activation_fn_2", "range": { "type": "CATEGORICAL", "choices": [ "tanh", "relu" ] }, "distribution": "UNIFORM" }, { "name": "batch_size", "range": { "type": "DISCRETE", "low": 0, "high": 4 }, "distribution": "UNIFORM" }, { "name": "dropout_1", "range": { "type": "DISCRETE", "low": 0, "high": 3 }, "distribution": "UNIFORM" }, { "name": "dropout_2", "range": { "type": "DISCRETE", "low": 0, "high": 3 }, "distribution": "UNIFORM" }, { "name": "init_lr", "range": { "type": "DISCRETE", "low": 0, "high": 6 }, "distribution": "UNIFORM" }, { "name": "lr_schedule", "range": { "type": "CATEGORICAL", "choices": [ "cosine", "const" ] }, "distribution": "UNIFORM" }, { "name": "n_units_1", "range": { "type": "DISCRETE", "low": 0, "high": 6 }, "distribution": "UNIFORM" }, { "name": "n_units_2", "range": { "type": "DISCRETE", "low": 0, "high": 6 }, "distribution": "UNIFORM" } ], "values_domain": [ { "name": "Validation MSE", "range": { "type": "CONTINUOUS", "low": 0.0 }, "distribution": "UNIFORM" } ], "steps": 100 } ``` ### ID: 779379c567ac85eea01acb44fcdea10fd36ba7ae3e5db21ab89b4775794eb122 recipe: ```json { "nasbench": { "dataset": "./nasbench_full.bin", "encoding": "A", "metrics": [ "ACCURACY" ] } } ``` specification: ```json { "name": "NASBench (A)", "attrs": { "github": "https://github.com/automl/nas_benchmarks", "paper": "Ying, Chris, et al. \"Nas-bench-101: Towards reproducible neural architecture search.\" arXiv preprint arXiv:1902.09635 (2019).", "version": "kurobako_problems=0.1.13" }, "params_domain": [ { "name": "op0", "range": { "type": "CATEGORICAL", "choices": [ "conv1x1-bn-relu", "conv3x3-bn-relu", "maxpool3x3" ] }, "distribution": "UNIFORM" }, { "name": "op1", "range": { "type": "CATEGORICAL", "choices": [ "conv1x1-bn-relu", "conv3x3-bn-relu", "maxpool3x3" ] }, "distribution": "UNIFORM" }, { "name": "op2", "range": { "type": "CATEGORICAL", "choices": [ "conv1x1-bn-relu", "conv3x3-bn-relu", "maxpool3x3" ] }, "distribution": "UNIFORM" }, { "name": "op3", "range": { "type": "CATEGORICAL", "choices": [ "conv1x1-bn-relu", "conv3x3-bn-relu", "maxpool3x3" ] }, "distribution": "UNIFORM" }, { "name": "op4", "range": { "type": "CATEGORICAL", "choices": [ "conv1x1-bn-relu", "conv3x3-bn-relu", "maxpool3x3" ] }, "distribution": "UNIFORM" }, { "name": "edge0", "range": { "type": "CATEGORICAL", "choices": [ "false", "true" ] }, "distribution": "UNIFORM" }, { "name": "edge1", "range": { "type": "CATEGORICAL", "choices": [ "false", "true" ] }, "distribution": "UNIFORM" }, { "name": "edge2", "range": { "type": "CATEGORICAL", "choices": [ "false", "true" ] }, "distribution": "UNIFORM" }, { "name": "edge3", "range": { "type": "CATEGORICAL", "choices": [ "false", "true" ] }, "distribution": "UNIFORM" }, { "name": "edge4", "range": { "type": "CATEGORICAL", "choices": [ "false", "true" ] }, "distribution": "UNIFORM" }, { "name": "edge5", "range": { "type": "CATEGORICAL", "choices": [ "false", "true" ] }, "distribution": "UNIFORM" }, { "name": "edge6", "range": { "type": "CATEGORICAL", "choices": [ "false", "true" ] }, "distribution": "UNIFORM" }, { "name": "edge7", "range": { "type": "CATEGORICAL", "choices": [ "false", "true" ] }, "distribution": "UNIFORM" }, { "name": "edge8", "range": { "type": "CATEGORICAL", "choices": [ "false", "true" ] }, "distribution": "UNIFORM" }, { "name": "edge9", "range": { "type": "CATEGORICAL", "choices": [ "false", "true" ] }, "distribution": "UNIFORM" }, { "name": "edge10", "range": { "type": "CATEGORICAL", "choices": [ "false", "true" ] }, "distribution": "UNIFORM" }, { "name": "edge11", "range": { "type": "CATEGORICAL", "choices": [ "false", "true" ] }, "distribution": "UNIFORM" }, { "name": "edge12", "range": { "type": "CATEGORICAL", "choices": [ "false", "true" ] }, "distribution": "UNIFORM" }, { "name": "edge13", "range": { "type": "CATEGORICAL", "choices": [ "false", "true" ] }, "distribution": "UNIFORM" }, { "name": "edge14", "range": { "type": "CATEGORICAL", "choices": [ "false", "true" ] }, "distribution": "UNIFORM" }, { "name": "edge15", "range": { "type": "CATEGORICAL", "choices": [ "false", "true" ] }, "distribution": "UNIFORM" }, { "name": "edge16", "range": { "type": "CATEGORICAL", "choices": [ "false", "true" ] }, "distribution": "UNIFORM" }, { "name": "edge17", "range": { "type": "CATEGORICAL", "choices": [ "false", "true" ] }, "distribution": "UNIFORM" }, { "name": "edge18", "range": { "type": "CATEGORICAL", "choices": [ "false", "true" ] }, "distribution": "UNIFORM" }, { "name": "edge19", "range": { "type": "CATEGORICAL", "choices": [ "false", "true" ] }, "distribution": "UNIFORM" }, { "name": "edge20", "range": { "type": "CATEGORICAL", "choices": [ "false", "true" ] }, "distribution": "UNIFORM" } ], "values_domain": [ { "name": "1.0 - Validation Accuracy", "range": { "type": "CONTINUOUS", "low": 0.0, "high": 1.0 }, "distribution": "UNIFORM" } ], "steps": [ 4, 12, 36, 108 ] } ``` ## Studies ### ID: 44bccb690d68ddea042e2f3e46eb5c09b1b1c019b097ac214bf4e4707db437ae - problem: [HPO-Bench-Naval](#id-8a1cfca70207d8f788756eb020787bdae5eda54db2b55f4e99e06fd3b2c37501) - solver: [_QMCSampler_NopPruner](#id-1fe0844ada564e666be66d917607bb09a043f34674ddd98a12633fea20cdc225) - budget: 80 - repeats: 100 - concurrency: 1 ### ID: 60a59ed57f7d10de797ae14783d11c6c752e0c7e8229f3c02fa1736a05a368d9 - problem: [HPO-Bench-Naval](#id-8a1cfca70207d8f788756eb020787bdae5eda54db2b55f4e99e06fd3b2c37501) - solver: [_RandomSampler_NopPruner](#id-93769d5e7bdf1663db0c81496d24996f744d05ff39033370a1df0b9de6da6c7e) - budget: 80 - repeats: 100 - concurrency: 1 ### ID: 014bf87a926ed060cbfc6409d5196dbd79331332e65ad47e4d923a8c3be766a6 - problem: [HPO-Bench-Naval](#id-8a1cfca70207d8f788756eb020787bdae5eda54db2b55f4e99e06fd3b2c37501) - solver: [_TPESampler_NopPruner](#id-1484b7953e8592b67007e0dbb7e0229c93d6746e3f7c11c87f90536dfc8dcd7e) - budget: 80 - repeats: 100 - concurrency: 1 ### ID: 54bc7edab245171e78983ee51e35b6dcbf22e48f538402e89e4f3b93b9232505 - problem: [HPO-Bench-Parkinson](#id-1342958b65feb590e4c0284f73a7373aa1006595f85f3ce9a4645ea1bfa581d6) - solver: [_QMCSampler_NopPruner](#id-1fe0844ada564e666be66d917607bb09a043f34674ddd98a12633fea20cdc225) - budget: 80 - repeats: 100 - concurrency: 1 ### ID: a7d6314033494dbdd4ab0bc15c56b0a05b3fa85f5819d75e680d1daaf962d9ec - problem: [HPO-Bench-Parkinson](#id-1342958b65feb590e4c0284f73a7373aa1006595f85f3ce9a4645ea1bfa581d6) - solver: [_RandomSampler_NopPruner](#id-93769d5e7bdf1663db0c81496d24996f744d05ff39033370a1df0b9de6da6c7e) - budget: 80 - repeats: 100 - concurrency: 1 ### ID: 9db303552ebbfe90542e9ad8706e0e984c92507d9e957986490f039761b691af - problem: [HPO-Bench-Parkinson](#id-1342958b65feb590e4c0284f73a7373aa1006595f85f3ce9a4645ea1bfa581d6) - solver: [_TPESampler_NopPruner](#id-1484b7953e8592b67007e0dbb7e0229c93d6746e3f7c11c87f90536dfc8dcd7e) - budget: 80 - repeats: 100 - concurrency: 1 ### ID: fd6c89fc25537cea16ff8a37c883edb97355e5ab63e32619322634abdefc8493 - problem: [HPO-Bench-Protein](#id-1b4174b8176983b02dd2f15d046224b7711a434a5cdcbaa21ee312a518d9e16f) - solver: [_QMCSampler_NopPruner](#id-1fe0844ada564e666be66d917607bb09a043f34674ddd98a12633fea20cdc225) - budget: 80 - repeats: 100 - concurrency: 1 ### ID: 7fd6f71364d7312b978e4a88224321fd05784c2af2b9fbfbb91ad51960fbab20 - problem: [HPO-Bench-Protein](#id-1b4174b8176983b02dd2f15d046224b7711a434a5cdcbaa21ee312a518d9e16f) - solver: [_RandomSampler_NopPruner](#id-93769d5e7bdf1663db0c81496d24996f744d05ff39033370a1df0b9de6da6c7e) - budget: 80 - repeats: 100 - concurrency: 1 ### ID: f669b6f11ef01141191375ba9774d1e58b215f7da9821ce4496b0f0caa2b34e1 - problem: [HPO-Bench-Protein](#id-1b4174b8176983b02dd2f15d046224b7711a434a5cdcbaa21ee312a518d9e16f) - solver: [_TPESampler_NopPruner](#id-1484b7953e8592b67007e0dbb7e0229c93d6746e3f7c11c87f90536dfc8dcd7e) - budget: 80 - repeats: 100 - concurrency: 1 ### ID: 1e5140a20a581453670c630ba337d29d3f65ad1cc94eebe88fa34ee65728423e - problem: [HPO-Bench-Slice](#id-5f2477ceae0c4ef5f4c2559346629b167d8768929dd754173010cdbcfe505631) - solver: [_QMCSampler_NopPruner](#id-1fe0844ada564e666be66d917607bb09a043f34674ddd98a12633fea20cdc225) - budget: 80 - repeats: 100 - concurrency: 1 ### ID: ab39c40a069d4f2cef208748e829beacdf2aaa9cf92e37fba70b96d312b08ddb - problem: [HPO-Bench-Slice](#id-5f2477ceae0c4ef5f4c2559346629b167d8768929dd754173010cdbcfe505631) - solver: [_RandomSampler_NopPruner](#id-93769d5e7bdf1663db0c81496d24996f744d05ff39033370a1df0b9de6da6c7e) - budget: 80 - repeats: 100 - concurrency: 1 ### ID: b008ed2fb8b228231c26093b49bd5223c6ed3ab6b9c69c987902609227c9be3e - problem: [HPO-Bench-Slice](#id-5f2477ceae0c4ef5f4c2559346629b167d8768929dd754173010cdbcfe505631) - solver: [_TPESampler_NopPruner](#id-1484b7953e8592b67007e0dbb7e0229c93d6746e3f7c11c87f90536dfc8dcd7e) - budget: 80 - repeats: 100 - concurrency: 1 ### ID: d2d3da12a8be9b25b863ea4d57d5876f04f9d096e2e31d9c45f45e1b86a0bae8 - problem: [NASBench (A)](#id-779379c567ac85eea01acb44fcdea10fd36ba7ae3e5db21ab89b4775794eb122) - solver: [_QMCSampler_NopPruner](#id-1fe0844ada564e666be66d917607bb09a043f34674ddd98a12633fea20cdc225) - budget: 80 - repeats: 100 - concurrency: 1 ### ID: 05e91d199311764b1f48e87f336d3dd997afefa11ff67ec82c714af7a2097938 - problem: [NASBench (A)](#id-779379c567ac85eea01acb44fcdea10fd36ba7ae3e5db21ab89b4775794eb122) - solver: [_RandomSampler_NopPruner](#id-93769d5e7bdf1663db0c81496d24996f744d05ff39033370a1df0b9de6da6c7e) - budget: 80 - repeats: 100 - concurrency: 1 ### ID: 9e95408cdcfdcd06fa7f866dd9a7c3a6fd94b5663e774230550839fc640c4b31 - problem: [NASBench (A)](#id-779379c567ac85eea01acb44fcdea10fd36ba7ae3e5db21ab89b4775794eb122) - solver: [_TPESampler_NopPruner](#id-1484b7953e8592b67007e0dbb7e0229c93d6746e3f7c11c87f90536dfc8dcd7e) - budget: 80 - repeats: 100 - concurrency: 1 ````

What should we do

[ ] Rewrite the score calculation method and score aggregation methods of bayesmark benchmark so that we can generate the same plots and report as that of kurobako benchmark.
[ ] Make the plots of bayesmark benchmark consistent with that of kurobako benchmark.
[ ] Make the report of bayesmark benchmark consistent with that of kurobako benchmark.

Alternatives (optional)

No response

Additional context (optional)

No response

xadrianzetx commented 2 years ago

I'm familiar with the topic and happy to continue developing this.

HideakiImamura commented 2 years ago

Hi @xadrianzetx. Thank you for your continuing contributions in Optuna. Sorry, but we aadopt this task as an item for a development sprint to be held today in Japan, so perhaps it will be implemented by a participant interested in this task. If so, would you mind if we use your deep insight into this item to review it?

HideakiImamura commented 2 years ago

@shu65 will work this task.

xadrianzetx commented 2 years ago

Ah, sorry, didn't notice this task on sprint board. Sure, happy to tag along for review if needed.

shu65 commented 2 years ago

I found that some values required for kurobako report are not output in baysemark.

What is the best way to deal with this problem?

fill dummy values to results of baysemark in kurobako results format. kurobako outputs prot with the results. however, kurobako report outputs dummy results with the results.
output data to need kurobako results format in https://github.com/optuna/optuna/blob/master/benchmarks/bayesmark/optuna_optimizer.py
add new report and plot functions to support kurobako results and baysemark results.

xadrianzetx commented 2 years ago

You mean metrics and ranking from this paper? These need to be re-implemented and use data bayesmark stores in eval and time directories (cross validation score or valid set score and execution time) of each run. So I'd say option 3 seems correct.

HideakiImamura commented 2 years ago

@xadrianzetx Thanks for the comment. I privately discussed about the options with @shu65, and it might be good to choose the option 2, since we are most interested in the kurobako-based evaluation method and think that option 2, which is sufficient to rewrite the bayesmark benchmark, would be the easiest to implement. The other evaluation method to make the report and the plot is future work as your suggested paper.

Unfortunately, shu65 will not have enough time to implement option 2. If you don't mind, I would like to ask for your help, being familiar with the bayesmark benchmark, to implement option 2 based on shu65's findings.

xadrianzetx commented 2 years ago

@HideakiImamura what would be the approach in that case? I wasn't really considering that option as there is not much we could do with OptunaOptimizer instance while benchmark is running. In any case, I can implement it next week or so once i know what the plan would be.

HideakiImamura commented 2 years ago

@xadrianzetx I am really sorry for the delay in replying. I had completely lost track of this issue.

I have reconsidered how to proceed with this task: since the hyperparameter optimization problem we are actually running in bayesmark uses sklearn, why not use it directly to run the optimization based on kurobako? As you pointed out before, we cannot use bayesmark's own performance metrics, but we want to make a fair comparison with other kurobako benchmarks, so a kurobako-like performance evaluation would be sufficient.

xadrianzetx commented 2 years ago

Then PTAL at dev repo where I've implemented paper kurobako performance evaluation is based on. This replaces bayesmark's metrics and should provide fair comparison. The report and plots were improved as well, to match kurobako.

Let me know if this would fit the approach and I'll work on upstreaming it.

HideakiImamura commented 2 years ago

Amazing. Let me check your solutions.

HideakiImamura commented 2 years ago

Thank you for carefully understanding kurobako's reporting logic and creating a highly complete builder. Frankly, I am amazed. Thank you so much. If you don't mind, could you submit the changes to https://github.com/xadrianzetx/dev-optuna-bayesmark as a PR to Optuna?

On the other hand, I am a little concerned about the large amount of code that will be added to the bayesmark benchmark in Optuna. I am concerned about what would happen if we gave kurobako the sklearn hyperparameter optimization problem I mentioned yesterday directly as a problem.

HideakiImamura commented 2 years ago

After your PR has been submitted, may I examine the volume of changes and increasing complexity, and consider whether to merge them into the master? I am very sorry for the trouble you are going through.

xadrianzetx commented 2 years ago

On the other hand, I am a little concerned about the large amount of code that will be added to the bayesmark benchmark in Optuna.

Yeah, I see your point. We are effectively re-implementing reporting parts of kurobako in Python. However, seems like kurobako is planned to be re-written in Python as part of GSoC 2022 anyway, so maybe parts we are about to introduce could be re-used. I've tried to be reasonable on design. Benchmark could then be refactored to drop bayesmark and just run sklearn problems with kurobako-py.

I am concerned about what would happen if we gave kurobako the sklearn hyperparameter optimization problem I mentioned yesterday directly as a problem.

This is a valid option, however it would require me getting familiar with current implementation of kurobako (or more specifically - how to add new problems to it). I'm happy to do it if needed, but lately I've been a bit constrained on time, so I would not be able to guarantee reasonable time frames.

After your PR has been submitted, may I examine the volume of changes and increasing complexity, and consider whether to merge them into the master?

I think that's the best way to approach it now. Let me polish the implementation a bit and I'll be opening PR in the next few days/weeks (hopefully days).

I am very sorry for the trouble you are going through.

No trouble at all!

HideakiImamura commented 2 years ago

I think that's the best way to approach it now. Let me polish the implementation a bit and I'll be opening PR in the next few days/weeks (hopefully days).

Thank you for your positive consideration. I look forward to getting a PR!

HideakiImamura commented 2 years ago

However, seems like kurobako is planned to be re-written in Python as part of GSoC 2022 anyway, so maybe parts we are about to introduce could be re-used. I've tried to be reasonable on design. Benchmark could then be refactored to drop bayesmark and just run sklearn problems with kurobako-py.

Yes, we have a plan to re-write the kurobako in Python. You are right. This implementation is very useful for this future project.

optuna / optuna