Closed iXanthos closed 4 years ago
Hi @iXanthos , by "training performance estimation", do you mean training error or predictions on the training set?
To get training predictions, you may do y_tr_pred = m.predict(x_tr)
if m
is a fitted instance of the AutoLearner
class; then, to get training error (under the balanced error rate metric), you may do util.error(y_tr, y_tr_pred, 'classification')
.
Hi @chengrunyang, I am a bit confused with what OBOE does (still reading the paper). When I talk about the training performance estimation, I would like to know if there is an estimation of performance of the final ensemble. I guess this is the overall training error.
Thanks in advance.
I am really sorry for my late reply.
What OBOE does relates to the goal of AutoML: automatically build a model (a machine learning algorithm with certain hyperparameters) on a dataset, so that human practitioners do not need to select models and tune hyperparameters by themselves.
The traditional manner of fitting on training set and testing on test set is:
m = <an algorithm with certain hyperparameters>
m.fit(x_train, y_train)
y_pred = m.predict(y_test)
error = error_function(y_test, y_pred)
OBOE and many other AutoML frameworks automatically selects the m
here, so that you just need to do:
m = <an AutoML framework; no need to specify algorithm type and hyperparameter values by yourself>
m.fit(x_train, y_train)
y_pred = m.predict(y_test)
error = error_function(y_test, y_pred)
Its my turn to say sorry for the late reply.
I see, my question lies on finding the metric used to select the best model m
. In more traditional automl approaches, (e.g. h2o-automl
) the best model is selected based on a score, derived from the model's performance on the training data (either using cv or holdout internally).
Example: I use a training set and an automl tool to get the best solution. The AutoML reports a training performance estimation and then outputs the final model. I can then use it with new unseen data (test data) to validate its performance. What I want are, the training performance estimation
and then be able to validate the model's final performance (as in your example).
From the paper I see OBOE follows a different approach, so can I get the training performance estimation
?
Sorry again for the late reply after a crazy month...
The procedure you described is the standard meta-learning process: select the model that performs the best on the meta-training set and test on the meta-test set. By saying "the best model is selected based on a score", you mean the model performance is evaluated based on a metric you choose.
OBOE follows exactly the same procedure, as I described above. The "training performance estimation" in your term, which is the meta-training error, can be evaluated by util.error(y_true, y_predicted, p_type, metric='BER')
:
https://github.com/udellgroup/oboe/blob/9b17bfde86efcb9937dd06fcbdf89c894966ac04/automl/util.py#L151
Oboe also uses this metric to select models.
Closing this for now. Feel free to reopen.
Greetings,
I am interested in using OBOE for a publication, in which I am also reporting the training performance estimation. From the examples I found here, I see no way to retrieve this estimation. Am I missing something, or has it not been implemented yet?
Thank in advance.