ray-project / tune-sklearn

A drop-in replacement for Scikit-Learn’s GridSearchCV / RandomizedSearchCV -- but with cutting edge hyperparameter tuning techniques.
https://docs.ray.io/en/master/tune/api_docs/sklearn.html
Apache License 2.0
465 stars 52 forks source link

TuneSearchCV not correctly handling error_score parameter #248

Closed ssiegel95 closed 1 year ago

ssiegel95 commented 2 years ago

Running random search using vanilla RandomizedSearchCV and then the equivalent operation using tune_sklearn. In the first case, the incorrect parameter combination is gracefully handled by sklearn via the error_score parameter being set to int|float as documented. However, when running the equivalent search using TuneSearchCV the entire job fails before completion.

import numpy as np
from sklearn.datasets import load_digits
from sklearn.decomposition import PCA
from sklearn.model_selection import RandomizedSearchCV
from sklearn.pipeline import Pipeline
from sklearn.svm import LinearSVC
from tune_sklearn import TuneSearchCV

X, y = load_digits(return_X_y=True)

PIPELINE = Pipeline(
    steps=[
        ("pca", PCA(iterated_power=7)),
        ("linearsvc", LinearSVC(dual=False, max_iter=10000)),
    ]
)

GRID = {
    "linearsvc__penalty": ["l1", "l2"],
    "linearsvc__loss": ["hinge", "squared_hinge"],
    "linearsvc__fit_intercept": [True, False],
    "linearsvc__dual": [True, False],
    "linearsvc__tol": [1e-05, 0.0001, 0.001, 0.01, 0.1],
    "linearsvc__C": [0.0001, 0.001, 0.01, 0.1, 0.5, 1.0, 5.0, 10.0, 15.0, 20.0, 25.0],
}

# first with sklearn
random = RandomizedSearchCV(
    PIPELINE, param_distributions=GRID, error_score=np.nan, n_iter=5, random_state=100
)

random.fit(X, y)
# will complete with the following warning
# ValueError: Unsupported set of arguments: The combination of penalty='l2'
# and loss='hinge' are not supported when
# dual=False, Parameters: penalty='l2', loss='hinge', dual=False
print(random.best_estimator_)
# Pipeline(steps=[('pca', PCA(iterated_power=7)),
#                ('linearsvc',
#                 LinearSVC(C=25.0, dual=False, max_iter=10000, tol=1e-05))])

# *******SAME THING WITH TUNE_SKLEARN ****************
# now with tune-sklear
random = TuneSearchCV(
    estimator=PIPELINE,
    param_distributions=GRID,
    search_optimization="random",
    n_trials=5,
    error_score=np.nan,
    verbose=True,
    random_state=100,
)

random.fit(X, y)
print(random.best_estimator)
# run terminates
eljur commented 1 year ago

TuneGridSearchCV has the same problem: https://discuss.ray.io/t/valueerror-with-tunegridsearchcv/4138/2

sergey-datadog commented 1 year ago

Hi there, what's the status on this please?

j-f-r commented 1 year ago

Bump

Yard1 commented 1 year ago

Hey folks, I'll be taking a look at this today.

ssiegel95 commented 1 year ago

For whatever it's worth, I was able to hack around this issue with the following

import numpy as np
from sklearn.pipeline import Pipeline

class ErrorRobustPipeline(Pipeline):
    """This is a hacky workaround for
    https://github.com/ray-project/tune-sklearn/issues/248.
    Its purpose is to catch exceptions that underlying
    frameworks such as sklearn throw on invalid parameter
    combinations. Once an exception is encountered, internal
    state is set such that future scoring attempts will always
    return np.nan as an error score."""

    def __init__(self, steps, *, memory=None, verbose=False):
        super(ErrorRobustPipeline, self).__init__(steps)
        self._is_errored = False

    def fit(self, X, y=None, **fit_params):
        try:
            return super(ErrorRobustPipeline, self).fit(X, y, **fit_params)
        except Exception as _:
            self._is_errored = True
            return self

    def score(self, X, y=None, sample_weight=None):
        try:
            return (
                np.nan
                if self._is_errored
                else super(ErrorRobustPipeline, self).score(X, y, sample_weight)
            )
        except Exception as _:
            return np.nan