ray-project / tune-sklearn

A drop-in replacement for Scikit-Learn’s GridSearchCV / RandomizedSearchCV -- but with cutting edge hyperparameter tuning techniques.
https://docs.ray.io/en/master/tune/api_docs/sklearn.html
Apache License 2.0
465 stars 52 forks source link

[Bug] Tuning CatBoost with GPU fails on Google Colab #67

Closed rohan-gt closed 4 years ago

rohan-gt commented 4 years ago

Running the following code on Google Colab with a GPU hardware accelerator:

from catboost import CatBoostClassifier
from tune_sklearn import TuneSearchCV
import ray
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

# Load breast cancer dataset
cancer = load_breast_cancer()
X = cancer.data
y = cancer.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42)

model = CatBoostClassifier(task_type="GPU", logging_level="Silent")
param_dists = {
    "iterations": (500, 600)
}

ray.init(webui_host="0.0.0.0")
gs = TuneSearchCV(model, param_dists, n_iter=2, scoring="accuracy",
                  search_optimization="bayesian")
gs.fit(X_train, y_train)
print(gs.cv_results_)

pred = gs.predict(X_test)
correct = 0
for i in range(len(y_test)):
    if pred[i] == y_test[i]:
        correct += 1
print("Accuracy:", correct / len(pred))

throws the following error:

2020-08-13 06:42:21,410 INFO resource_spec.py:212 -- Starting Ray with 7.13 GiB memory available for workers and up to 3.58 GiB for objects. You can adjust these settings with ray.init(memory=<bytes>, object_store_memory=<bytes>).
2020-08-13 06:42:21,672 WARNING services.py:923 -- Redis failed to start, retrying now.
2020-08-13 06:42:21,873 INFO services.py:1165 -- View the Ray dashboard at 172.28.0.2:8265
/usr/local/lib/python3.6/dist-packages/tune_sklearn/tune_basesearch.py:249: UserWarning: Early stopping is not enabled. To enable early stopping, pass in a supported scheduler from Tune and ensure the estimator has `partial_fit`.
  warnings.warn("Early stopping is not enabled. "
2020-08-13 06:42:30,803 INFO logger.py:271 -- Removed the following hyperparameter values when logging to tensorboard: {'iterations': 510}
2020-08-13 06:42:30,867 INFO logger.py:271 -- Removed the following hyperparameter values when logging to tensorboard: {'iterations': 586}
(pid=2326) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_validation.py:536: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
(pid=2326) _catboost.CatBoostError: catboost/cuda/cuda_lib/cuda_base.h:281: CUDA error 100: no CUDA-capable device is detected
(pid=2326) 
(pid=2326)   FitFailedWarning)
(pid=2326) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_validation.py:536: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
(pid=2326) _catboost.CatBoostError: catboost/cuda/cuda_lib/cuda_manager.cpp:201: Condition violated: `State == nullptr'
(pid=2326) 
(pid=2326)   FitFailedWarning)
(pid=2326) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_validation.py:536: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
(pid=2326) _catboost.CatBoostError: catboost/cuda/cuda_lib/cuda_base.h:281: CUDA error 100: no CUDA-capable device is detected
(pid=2326) 
(pid=2326)   FitFailedWarning)
(pid=2326) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_validation.py:536: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
(pid=2326) _catboost.CatBoostError: catboost/cuda/cuda_lib/cuda_manager.cpp:201: Condition violated: `State == nullptr'
(pid=2326) 
(pid=2326)   FitFailedWarning)
(pid=2327) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_validation.py:536: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
(pid=2327) _catboost.CatBoostError: catboost/cuda/cuda_lib/cuda_base.h:281: CUDA error 100: no CUDA-capable device is detected
(pid=2327) 
(pid=2327)   FitFailedWarning)
(pid=2327) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_validation.py:536: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
(pid=2327) _catboost.CatBoostError: catboost/cuda/cuda_lib/cuda_manager.cpp:201: Condition violated: `State == nullptr'
(pid=2327) 
(pid=2327)   FitFailedWarning)
(pid=2327) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_validation.py:536: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
(pid=2327) _catboost.CatBoostError: catboost/cuda/cuda_lib/cuda_base.h:281: CUDA error 100: no CUDA-capable device is detected
(pid=2327) 
(pid=2327)   FitFailedWarning)
(pid=2327) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_validation.py:536: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
(pid=2327) _catboost.CatBoostError: catboost/cuda/cuda_lib/cuda_manager.cpp:201: Condition violated: `State == nullptr'
(pid=2327) 
(pid=2327)   FitFailedWarning)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-11-beac382c1f80> in <module>()
     21 gs = TuneSearchCV(model, param_dists, n_iter=2, scoring="accuracy",
     22                   search_optimization="bayesian")
---> 23 gs.fit(X_train, y_train)
     24 print(gs.cv_results_)
     25 

5 frames
/usr/local/lib/python3.6/dist-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
   1491             key = item_from_zerodim(key)
   1492             if not is_integer(key):
-> 1493                 raise TypeError("Cannot index by location index with a non-integer key")
   1494 
   1495             # validate the location

TypeError: Cannot index by location index with a non-integer key

while running it normally works fine:

model = CatBoostClassifier(task_type="GPU", logging_level="Silent")
model.fit(X_train, y_train)
rohan-gt commented 4 years ago

Actually never mind setting use_gpu=True in TuneSearchCV fixed it. This is actually related to this issue. Details about this parameter is missing in the documentation. Also does GPU need to be explicitly enabled within the model eg. CatBoostClassifier(task_type="GPU") or LGBMClassifer(device_type="gpu") for this to work?