hyperopt / hyperopt-sklearn

Hyper-parameter optimization for sklearn
hyperopt.github.io/hyperopt-sklearn
Other
1.59k stars 274 forks source link

Does hyperopt work for multiclass and multilabel classification? #69

Open LishengSun opened 7 years ago

LishengSun commented 7 years ago

Hi,

I have 2 questions: 1) I would like to know if hyperopt work for multiclass / multilable classification? For example, something like:

estimator = HyperoptEstimator(classifier=OneVsRest(svc('my_est')), algo=tpe.suggest, preprocessing=[], use_partial_fit=True, trial_timeout=timeout)

2) I found that hyperopt is quite slow when the training data is large. I think the parameter 'use_partial_fit' might speed up the fitting process, am I right? Is this the best practice to tell hyperopt not to train the entire training data when it is too large?

Thank you in advance!

bjkomer commented 7 years ago
  1. I don't think that sort of thing is currently supported, but there is no reason why it shouldn't be possible to add it in. I'm thinking it might end up looking something like this:
estim = HyperoptEstimator(classifier=svc('my_est'), algo=tpe.suggest, ...)
multi_clf = OneVsRestClassifier(estim)

or possibly build a function in hyperop-sklearn to handle this kind of classification:

estim = HyperoptEstimator(classifier=one_vs_rest('my_multi_clf', clf=svc('my_est')), algo=tpe.suggest, ...)

The main difference between these two implementations is the first one will allow different parameters to be used for each of the individual classifiers, including different choices of the classifier itself. The second implementation will be more restricted, but should be a lot faster.

  1. Depending on the size of your data, the type of classifiers being looked at, and the number of evals, it can be really slow to fit your data. use_partial_fit and trial_timeout can definitely help with this. The use_partial_fit flag adds some checks to see if the current evaluation is unlikely to perform better than the current best and will do early stopping and go onto the next point, which can save a lot of time. Not all classifiers in sklearn support partial fit, so in those cases some options you can try are training in parallel, reducing the training data, lowering the timeout, or shrinking the search space.
LishengSun commented 7 years ago

Thank you for your reply!

1) Are you going to include this function soon? Actually I am building a AutoML benchmark and would like to include hyperopt. We have different tasks (binary classification, regression and multiclass/label classification). Maybe I can help with this in my spare time if you need.

2) I don't quite understand what trial_timeout does. Does it output a model when times out even it is not a converged solution?

Thank you in advance!

bjkomer commented 7 years ago

trial_timeout is the maximum amount of time each evaluation is given to complete. For example, if trial_timeout is set to 300 seconds, and max_evals is set to 10, then the total search process will run for a maximum of 50 minutes. If an individual trial times out, it will report a failure with no model output unless use_partial_fit is also set to True. This flag allows non-converged solutions to be returned when the time is up (Note: not all classifiers in sklearn support this, but hyperopt-sklearn will do a check and use it if it can). If you set trial_timeout to None it will default to Infinity.

I've put together some multiclass functions in a new branch. I haven't done much testing, but they seem to work, at least on this example I'll post below. Feel free to try them out. Help and suggestions are always welcome :)

from hpsklearn import HyperoptEstimator, svc, one_vs_rest, one_vs_one, output_code
from hyperopt import tpe
from sklearn import datasets
import numpy as np

iris = datasets.load_iris()
X, y = iris.data, iris.target

test_size = int(0.2 * len(y))
np.random.seed(13)
indices = np.random.permutation(len(X))
X_train = X[indices[:-test_size]]
y_train = y[indices[:-test_size]]
X_test = X[indices[-test_size:]]
y_test = y[indices[-test_size:]]

# These will default to search the classifiers in the 'any_classifier' space
#clf = one_vs_rest('clf')
#clf = one_vs_one('clf')
#clf = output_code('clf')

# This is how you choose a specific classifier to use
clf = one_vs_rest('clf', estimator=svc('my_est'))

estim = HyperoptEstimator(classifier=clf, preprocessing=[], algo=tpe.suggest, trial_timeout=120, max_evals=10)

estim.fit(X_train, y_train)

print(estim.trials.results)
print('Score:', estim.score(X_test, y_test))
print(estim.best_model())
LishengSun commented 7 years ago

Thank you very much!

I will give the new branch a try ASAP.

ghost commented 7 years ago

Please forgive me how should I apply hyperopt-sklearn for multi class target prediction

my y_train.shape is equal to (1000,5)

from hpsklearn import HyperoptEstimator, any_classifier
estim = HyperoptEstimator( classifier=any_classifier('clf'),  
                            algo=tpe.suggest, trial_timeout=300)

estim.fit( x_train, y_train )

its giving this error

ValueError: bad input shape (956, 5)

bjkomer commented 7 years ago

@potholiday For multi-label classification you need to use the One-vs-Rest classifier.

from hpsklearn import HyperoptEstimator, one_vs_rest
estim = HyperoptEstimator( classifier=one_vs_rest('clf'),  
                            algo=tpe.suggest, trial_timeout=300)
estim.fit( x_train, y_train )
ghost commented 7 years ago

Thanks for the quick reply. I got this output for iris data set (output variable one-hot encoded)

print(estim.trials.results)
[{'status': 'ok', 'loss': 0.04166666666666663, 'duration': 0.11149907112121582, 'loss_variance': 0.00056240219092331724}, {'status': 'ok', 'loss': 0.33333333333333337, 'duration': 0.11178112030029297, 'loss_variance': 0.0031298904538341159}, {'status': 'ok', 'loss': 0.08333333333333337, 'duration': 0.0271151065826416, 'loss_variance': 0.0010758998435054779}, {'status': 'ok', 'loss': 0.11111111111111116, 'duration': 0.2485649585723877, 'loss_variance': 0.0013910624239262743}, {'status': 'ok', 'loss': 0.02777777777777779, 'duration': 1.0009288787841797, 'loss_variance': 0.00038036863154234063}, {'status': 'ok', 'loss': 0.02777777777777779, 'duration': 26.054779052734375, 'loss_variance': 0.00038036863154234063}, {'status': 'ok', 'loss': 0.33333333333333337, 'duration': 0.020457983016967773, 'loss_variance': 0.0031298904538341159}, {'status': 'ok', 'loss': 0.08333333333333337, 'duration': 12.955389022827148, 'loss_variance': 0.0010758998435054779}, {'status': 'ok', 'loss': 0.02777777777777779, 'duration': 0.09855389595031738, 'loss_variance': 0.00038036863154234063}, {'status': 'ok', 'loss': 0.09722222222222221, 'duration': 0.32352304458618164, 'loss_variance': 0.0012361980525126062}]

print(estim.score(X_test, y_test))
0.93333333333333

print(estim.best_model())
{'learner': OneVsRestClassifier(estimator=AdaBoostClassifier(algorithm='SAMME.R', base_estimator=None,
          learning_rate=0.00561843035481, n_estimators=162, random_state=0),
          n_jobs=1), 'preprocs': (), 'ex_preprocs': ()}

check print(estim.best_model()) result,what is the meaning of that result? How should I save the best model?

I am not from a computational background and so please forgive me if I am asking a stupid question. How does this module really works? For finding the best model this module do exhaustive search for models and its parameters? or this have some logical way like back prop in neural networks to reach the best model and its parameters.

ghost commented 7 years ago

I forgot to mention one important thing for multi class prediction is accuracy is the best way to estimate the score?. Shouldn't we use auc,roc or some other metrics for measuring the output result

bjkomer commented 7 years ago

@potholiday the output of estim.best_model() contains the trained model with the best parameter setting, along with any preprocessing that goes along with it. This is found by exploring the parameter space based on the search algorithm (the algo parameter) used. Its impossible to do an exhaustive search in a continuous space, but the algorithm can spend its time in more promising areas. To use the model, you can do something like this:

model = estim.best_model()['learner']

From there you can do anything you want with the model, such as using it for prediction, saving it to a file, etc. If you want a metric besides accuracy that is certainly possible (and for multilabel that often makes more sense).

# some stuff you can do
from sklearn.metrics import roc_auc_score
pred = model.predict(X_test)
print(roc_auc_score(y_test, pred)
print(my_custom_metric(y_test, pred)

pickle.dump(model, open("my_model.pkl", "wb"))
#etc
Wagner-Alvarenga commented 7 years ago

Hi @bjkomer, Still talking about the configuration found by the method, is there a way to have access, not only to the best model (using "estim.best_model()"), but also to all the models that were tried ? Is there a way to access the configuration and the test score result from each candidate model that was trained? Thank you for your kindness!

goonmeet commented 5 years ago

Hi,

We are trying to use hyperopt-sklearn for multilabel classification. However, we are not able to get good performance using hyperopt-sklearn. A simple logistic regression algorithm through scikit-learn performs much better. Is there any insight as to why this might be happening? This is how we are creating our estimators:

hpsklearn.components.one_vs_rest('my_multi_svc', estimator=hpsklearn.components.svc('my_svc')), hpsklearn.components.one_vs_rest('my_multi_liblinear_svc', estimator=hpsklearn.components.liblinear_svc('my_liblinear_svc')), hpsklearn.components.one_vs_rest('my_multi_svc_linear', estimator=hpsklearn.components.svc_linear('my_svc_linear'))

Thanks in advance!