hyperopt / hyperopt-sklearn

Hyper-parameter optimization for sklearn
hyperopt.github.io/hyperopt-sklearn
Other
1.59k stars 272 forks source link

Error while using AUC as a loss function : Please help us solve this #141

Open PhanindraPanthagani opened 5 years ago

PhanindraPanthagani commented 5 years ago

I am using the following code to define a continous loss function AUC for classifier but It throws the error below .Can you please add AUC functionality to the code.

Code : from tpot import TPOTClassifier

from hpsklearn import HyperoptEstimator, any_classifier,any_preprocessing from sklearn.datasets import load_iris from hyperopt import tpe import numpy as np

from sklearn.metrics import roc_auc_score

Make roc_auc_score loss function

def roclossfn(y_true, y_probabilities): lossroc=1-roc_auc_score(y_true, y_probabilities) return lossroc

Instantiate a HyperoptEstimator with the search space and number of evaluations

estim = HyperoptEstimator(classifier=any_classifier('clf'), preprocessing=any_preprocessing('my_pre'), algo=tpe.suggest, max_evals=30, trial_timeout=180,seed=30,continuous_loss_fn=True,loss_fn=roclossfn )

                     # 
                      #Specifying the loss funciton as ROC,default is accuracy score ,continuous_loss_fn should be set to True for it calculate probabilities continuous_loss_fn=True,

preprocessing=any_preprocessing('my_pre'),

print("estim is",estim)

Search the hyperparameter space based on the data

estim.fit(X_trainnp, y_trainnp)

print("Validation score is ",estim.score(X_validnp, y_validnp)) print( estim.best_model() )

Error:

AttributeError Traceback (most recent call last)

in () 33 # Search the hyperparameter space based on the data 34 ---> 35 estim.fit(X_trainnp, y_trainnp) 36 37 /databricks/python/lib/python3.6/site-packages/hpsklearn/estimator.py in fit(self, X, y, EX_list, valid_size, n_folds, cv_shuffle, warm_start, random_state, weights) 781 increment = min(self.fit_increment, 782 adjusted_max_evals - len(self.trials.trials)) --> 783 fit_iter.send(increment) 784 if filename is not None: 785 with open(filename, 'wb') as dump_file: /databricks/python/lib/python3.6/site-packages/hpsklearn/estimator.py in fit_iter(self, X, y, EX_list, valid_size, n_folds, cv_shuffle, warm_start, random_state, weights, increment) 691 # so we notice them. 692 catch_eval_exceptions=False, --> 693 return_argmin=False, # -- in case no success so far 694 ) 695 else: /databricks/python/lib/python3.6/site-packages/hyperopt/fmin.py in fmin(fn, space, algo, max_evals, trials, rstate, allow_trials_fmin, pass_expr_memo_ctrl, catch_eval_exceptions, verbose, return_argmin, points_to_evaluate, max_queue_len, show_progressbar) 402 return_argmin=return_argmin, 403 max_queue_len=max_queue_len, --> 404 show_progressbar=show_progressbar, 405 ) 406 /databricks/python/lib/python3.6/site-packages/hyperopt/base.py in fmin(self, fn, space, algo, max_evals, rstate, verbose, pass_expr_memo_ctrl, catch_eval_exceptions, return_argmin, max_queue_len, show_progressbar) 644 return_argmin=return_argmin, 645 max_queue_len=max_queue_len, --> 646 show_progressbar=show_progressbar) 647 648 /databricks/python/lib/python3.6/site-packages/hyperopt/fmin.py in fmin(fn, space, algo, max_evals, trials, rstate, allow_trials_fmin, pass_expr_memo_ctrl, catch_eval_exceptions, verbose, return_argmin, points_to_evaluate, max_queue_len, show_progressbar) 421 show_progressbar=show_progressbar) 422 rval.catch_eval_exceptions = catch_eval_exceptions --> 423 rval.exhaust() 424 if return_argmin: 425 if len(trials.trials) == 0: /databricks/python/lib/python3.6/site-packages/hyperopt/fmin.py in exhaust(self) 275 def exhaust(self): 276 n_done = len(self.trials) --> 277 self.run(self.max_evals - n_done, block_until_done=self.asynchronous) 278 self.trials.refresh() 279 return self /databricks/python/lib/python3.6/site-packages/hyperopt/fmin.py in run(self, N, block_until_done) 240 else: 241 # -- loop over trials and do the jobs directly --> 242 self.serial_evaluate() 243 244 try: /databricks/python/lib/python3.6/site-packages/hyperopt/fmin.py in serial_evaluate(self, N) 139 ctrl = base.Ctrl(self.trials, current_trial=trial) 140 try: --> 141 result = self.domain.evaluate(spec, ctrl) 142 except Exception as e: 143 logger.info('job exception: %s' % str(e)) /databricks/python/lib/python3.6/site-packages/hyperopt/base.py in evaluate(self, config, ctrl, attach_attachments) 849 memo=memo, 850 print_node_on_error=self.rec_eval_print_node_on_error) --> 851 rval = self.fn(pyll_rval) 852 853 if isinstance(rval, (float, int, np.number)): /databricks/python/lib/python3.6/site-packages/hpsklearn/estimator.py in fn_with_timeout(*args, **kwargs) 654 assert fn_rval[0] in ('raise', 'return') 655 if fn_rval[0] == 'raise': --> 656 raise fn_rval[1] 657 658 # -- remove potentially large objects from the rval AttributeError: probability estimates are not available for loss='epsilon_insensitive'
bjkomer commented 4 years ago

I believe this error is stemming from some of the sklearn classifiers being searched over not having a predict_proba method. It looks like using 'continuous_loss_fn=True' assumes this is the case. Instead of using any_classifier you'll have to restrict your search space to those that are able to return probabilities.

It definitely would be a useful addition to have another predefined high level search space that covers these classifiers

christophelebrun commented 4 years ago

Try to set continuous_loss_fn to False, since roc_auc_score do not need probabilities.

pengcao commented 4 years ago

remove continuous_loss_fn