hyperopt / hyperopt-sklearn

Hyper-parameter optimization for sklearn
hyperopt.github.io/hyperopt-sklearn
Other
1.57k stars 270 forks source link

Hyperopt for linear_svc provided sub optimal hyperparameters than the default parameters of scikit-learn. #199

Closed Varun-GP closed 1 year ago

Varun-GP commented 1 year ago

Hi,

I was wondering what might be an issue where the performance of the model on test set (accuracy) provided by the hyperopt is not optimal since scikit-learn's default parameters provided better accuracy for the Linear SVC model.

HyperoptEstimator(verbose = False,classifier=linear_svc("svc"), preprocessing=any_preprocessing('pre'), algo=tpe.suggest, max_evals=100,  trial_timeout=30)
mandjevant commented 1 year ago

Interesting comment. I reviewed the linear svc implementation and compared it with the (most current) sklearn linear svc default parameters. Most of the parameters in hyperopt-sklearn are kept the same as the default parameters. The deviation can come from the following two parameters:

tol parameter

def _svm_tol(name: str):
    """
    Declaration search space 'tol' parameter
    """
    return hp.loguniform(name, np.log(1e-5), np.log(1e-2))

C parameter

def _linear_C(name: str):
    """
    Declaration search space 'C' parameter
    """
    return hp.uniform(name, 0.5, 1.5)

From the sklearn documentation, the tol parameter: default=1e-4 and C parameter: default=1.0. From the code snippets above, it can be seen that hyperopt searches in the neighbourhood of these default parameters. Since the default sklearn parameter values are included in the hyperopt search space, hyperopt will be able to converge to the default param values (if the score is optimal). So there is no disparity here that explains your results.

Instead, I'd suggest to take a closer look at the preprocessing. Perhaps there is a disparity between your test with sklearn linear svc with default params, and the hyperopt-sklearn linear svc. With the any_preprocessing function, one random preprocessing algorithm is chosen (see here). This might explain deviation in your results.

Varun-GP commented 1 year ago

Thank you. I confirm that the model performance significantly changes with any_preprocessing function. I do observe slight change in performance without this function when compared with default sklearn parameters. However, I decided to run with large max_evals which provided the best model.