Recently, SAASBO has been demonstrated to be a highly effect high-dimensional Bayesian optimization scheme. Here, we use Ax/SAASBO Bayesian adaptive design to simultaneously optimize 23 hyperparameters of CrabNet. 100 sequential design iterations were used, and parameters were chosen based on a combination of intuition and algorithm/data constraints (e.g. elemental featurizers which were missing elements contained in the dataset were removed). The first 10 iterations were based on SOBOL sampling to create a rough initial model, while the remaining 90 iterations were SAASBO Bayesian adaptive design iterations. For the innerloops (where hyperparameter optimization is performed), the average MAE across each of the five inner folds was used as Ax's objective to minimize. The best parameter set was then trained on all the inner fold data and used to predict on the test set (unknown during hyperparameter optimization). This is nested cross-validation (CV), and is computationally expensive. See automatminer: running a benchmark for more information on nested CV.
Included files
matbench.py running a Matbench fold and splitting these into separate jobs, collecting results
metric.py Average MAE for CrabNet to supply to Ax as "Metric"
parameterization.py interfacing CrabNet API with Ax API
Brief description of your algorithm
Recently, SAASBO has been demonstrated to be a highly effect high-dimensional Bayesian optimization scheme. Here, we use Ax/SAASBO Bayesian adaptive design to simultaneously optimize 23 hyperparameters of CrabNet.
100
sequential design iterations were used, and parameters were chosen based on a combination of intuition and algorithm/data constraints (e.g. elemental featurizers which were missing elements contained in the dataset were removed). The first10
iterations were based on SOBOL sampling to create a rough initial model, while the remaining90
iterations were SAASBO Bayesian adaptive design iterations. For the innerloops (where hyperparameter optimization is performed), the average MAE across each of the five inner folds was used as Ax's objective to minimize. The best parameter set was then trained on all the inner fold data and used to predict on the test set (unknown during hyperparameter optimization). This is nested cross-validation (CV), and is computationally expensive. See automatminer: running a benchmark for more information on nested CV.Included files
matbench.py
running a Matbench fold and splitting these into separate jobs, collecting resultsmetric.py
Average MAE for CrabNet to supply to Ax as "Metric"parameterization.py
interfacing CrabNet API with Ax APIplotting.py
plotting codesearch.py
Ax SearchSpace for CrabNet