ray-project / tune-sklearn

A drop-in replacement for Scikit-Learn’s GridSearchCV / RandomizedSearchCV -- but with cutting edge hyperparameter tuning techniques.
https://docs.ray.io/en/master/tune/api_docs/sklearn.html
Apache License 2.0
465 stars 52 forks source link

This TuneGridSearchCV instance is not fitted yet. #107

Closed yijing332 closed 3 years ago

yijing332 commented 4 years ago

I have a question,:Does Tune_sklearn only support estimators with 'partial_fit' or 'warm_start' attributes?

When I use the Randomforestclassifier as an estimator, the program always reports an error: This TuneGridSearchCV instance is not fitted yet.

Code:

# from sklearn.model_selection import GridSearchCV
from tune_sklearn import TuneGridSearchCV
from sklearn.ensemble import RandomForestClassifier
# Other imports
import time
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import SGDClassifier

# Set training and validation sets
X, y = make_classification(n_samples=11000, n_features=1000, n_informative=50, n_redundant=0, n_classes=10, class_sep=2.5)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=1000)

# Example parameters to tune from SGDClassifier
param_grid = {
        'n_estimators': [100,200,300],
        'max_depth':range(5,30,5),
        'min_samples_split':[1,2,5,10,15],
        'min_samples_leaf':[1,2,5,10],
        'max_features': ['log2','sqrt']
    }
forest_clf = RandomForestClassifier(random_state=42,warm_start=True)

grid_search = TuneGridSearchCV(forest_clf, param_grid, cv=5,scoring='accuracy',use_gpu=True)

start=time.time()
grid_search.fit(X_train, y_train)
end=time.time()
print('Tune time: ',end-start)
score=grid_search.score(X_test,y_test)
print("Tune Score:", score)
inventormc commented 4 years ago

I think this is an issue with the parameters set. -- min_samples_split must be greater than 1. Can you fix that and see if it works? That being said, this error message is not very informative at all, and we'll add a better error message for this.

To answer your question, tune-sklearn should work with any sklearn estimator (it can do everything that GridSearchCV or RandomSearchCV can do); it only requires the estimator to have partial_fit or warm_start when you want to do early stopping.

@richardliaw

yijing332 commented 4 years ago

Thank you very much for your answer and help me find out the mistakes! I adjusted the parameters and tried again, but it still didn't seem to work, but when I remove the use_gpu parameter and reduce the number of n_jobs, it can run normally.

inventormc commented 4 years ago

Great! Glad it works now. I'm assuming you have a gpu on the machine you ran this on right?

yijing332 commented 4 years ago

Yes, there are two GPUs on my laptop. Is it necessary to configure it when I use that param?

inventormc commented 4 years ago

By default, tune-sklearn doesn't use gpu during training, so yes. It should work with n_jobs and use_gpu, so we'll take a look at this.

richardliaw commented 3 years ago

Closing this as we have seemed to address this issue!