ray-project / tune-sklearn

A drop-in replacement for Scikit-Learn’s GridSearchCV / RandomizedSearchCV -- but with cutting edge hyperparameter tuning techniques.
https://docs.ray.io/en/master/tune/api_docs/sklearn.html
Apache License 2.0
466 stars 52 forks source link

TuneSearchCV with EarlyStopping ValueError: sample_weight.shape == (190,), expected (271,)! #123

Closed dxyzx0 closed 3 years ago

dxyzx0 commented 3 years ago
from sklearn.linear_model import SGDRegressor
from sklearn.ensemble import RandomForestRegressor

### parameters
# parameter_grid = {"C": [0.2, 1.0, 5.0], "epsilon": [0.02, 0.1, 0.5]}
regr = SGDRegressor(verbose=True)
parameter_grid = {"loss": ["squared_loss", "huber"],"penalty": ['l1', 'l2'], "learning_rate": ['optimal', 'invscaling', 'adaptive']}
### tune-sklearn
from ray.tune.sklearn import TuneGridSearchCV, TuneSearchCV

tune_search = TuneSearchCV(
    regr,
    parameter_grid,
    search_optimization="bayesian",
    n_trials=3,
    early_stopping=True,
# If I set early_stopping=False, it has no problem
    max_iters=10,
)
tune_search.fit(X_train_scaled, y_train_scaled)

The error says

/usr/local/envs/test/lib/python3.8/site-packages/tune_sklearn/tune_basesearch.py:269: UserWarning: max_iters is set > 1 but incremental/partial training is not enabled. To enable partial training, ensure the estimator has `partial_fit` or `warm_start` and set `early_stopping=True`. Automatically setting max_iters=1.
  warnings.warn(
WARNING: The object store is using /tmp instead of /dev/shm because /dev/shm has only 66592768 bytes available. This may slow down performance! You may be able to free up space by deleting files in /dev/shm or terminating any running plasma_store_server processes. If you are inside a Docker container, you may need to pass an argument with the flag '--shm-size' to 'docker run'.
Trial _Trainable_9f558c4c: Error processing event.
Traceback (most recent call last):
  File "/usr/local/envs/test/lib/python3.8/site-packages/ray/tune/trial_runner.py", line 515, in _process_trial
    result = self.trial_executor.fetch_result(trial)
  File "/usr/local/envs/test/lib/python3.8/site-packages/ray/tune/ray_trial_executor.py", line 488, in fetch_result
    result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT)
  File "/usr/local/envs/test/lib/python3.8/site-packages/ray/worker.py", line 1428, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): ray::_Trainable.train() (pid=7541, ip=172.17.0.3)
  File "python/ray/_raylet.pyx", line 484, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 438, in ray._raylet.execute_task.function_executor
  File "/usr/local/envs/test/lib/python3.8/site-packages/ray/tune/trainable.py", line 336, in train
    result = self.step()
  File "/usr/local/envs/test/lib/python3.8/site-packages/tune_sklearn/_trainable.py", line 119, in step
    return self._train()
  File "/usr/local/envs/test/lib/python3.8/site-packages/tune_sklearn/_trainable.py", line 178, in _train
    self._early_stopping_partial_fit(i, estimator, X_train,
  File "/usr/local/envs/test/lib/python3.8/site-packages/tune_sklearn/_trainable.py", line 125, in _early_stopping_partial_fit
    estimator.partial_fit(X_train, y_train, np.unique(self.y))
  File "/usr/local/envs/test/lib/python3.8/site-packages/sklearn/linear_model/_stochastic_gradient.py", line 1181, in partial_fit
    return self._partial_fit(X, y, self.alpha, C=1.0,
  File "/usr/local/envs/test/lib/python3.8/site-packages/sklearn/linear_model/_stochastic_gradient.py", line 1136, in _partial_fit
    sample_weight = _check_sample_weight(sample_weight, X)
  File "/usr/local/envs/test/lib/python3.8/site-packages/sklearn/utils/validation.py", line 1302, in _check_sample_weight
    raise ValueError("sample_weight.shape == {}, expected {}!"
ValueError: sample_weight.shape == (190,), expected (271,)!

The data is Boston.

richardliaw commented 3 years ago

Thanks for reporting this issue!

cc @inventormc do you have time to take a look at this?

richardliaw commented 3 years ago

@DingXiangYuanZhiXing this works for me:

from sklearn.linear_model import SGDRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.datasets import (
    make_blobs,
    make_classification,)
X, y = make_classification(n_samples=11000, n_features=1000, n_informative=50, n_redundant=0, n_classes=10, class_sep=2.5)

### parameters
# parameter_grid = {"C": [0.2, 1.0, 5.0], "epsilon": [0.02, 0.1, 0.5]}
regr = SGDRegressor(verbose=True)
parameter_grid = {"loss": ["squared_loss", "huber"],"penalty": ['l1', 'l2'], "learning_rate": ['optimal', 'invscaling', 'adaptive']}
### tune-sklearn
from ray.tune.sklearn import TuneGridSearchCV, TuneSearchCV

tune_search = TuneSearchCV(
    regr,
    parameter_grid,
    search_optimization="bayesian",
    n_trials=3,
    early_stopping=True,
# If I set early_stopping=False, it has no problem
    max_iters=10,
)
tune_search.fit(X, y)

Can you help provide a reproducible script (with the dataset/processing)? I'd love to help figure out the root cause of this issue!

dxyzx0 commented 3 years ago

@richardliaw Thank you for your patience!

#!/usr/bin/env python
# coding: utf-8

import pandas as pd

from sklearn.datasets import load_boston
X,y = load_boston(return_X_y=True)

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=None)

from sklearn.preprocessing import StandardScaler
### normalize X
X_scaler = StandardScaler()
X_train_scaled = X_scaler.fit_transform(X_train)
X_test_scaled = X_scaler.transform(X_test)

### normalize y
y_scaler = StandardScaler()
y_train_scaled = y_scaler.fit_transform(y_train.reshape(-1, 1)).ravel()
y_test_scaled = y_scaler.transform(y_test.reshape(-1,1)).ravel()

### model selection
# lightgbm xgboost svm rf lr adaboost
from lightgbm import LGBMRegressor
from xgboost import XGBRegressor
from sklearn.ensemble import AdaBoostRegressor
from sklearn.svm import SVR
from sklearn.linear_model import SGDRegressor
from sklearn.ensemble import RandomForestRegressor

# model name
model_list = ["SGDRegressor", "LGBMRegressor", "XGBRegressor", "AdaBoostRegressor", "RandomForestRegressor"]

### parameters
Regr_list = {"SGDRegressor": SGDRegressor, 
             "LGBMRegressor": LGBMRegressor,
             "XGBRegressor": XGBRegressor,
             "AdaBoostRegressor": AdaBoostRegressor,
             "RandomForestRegressor": RandomForestRegressor,
}

parameter_grid_list = {
    "SGDRegressor": 
        {"loss": ["squared_loss", "huber"],
         "penalty": ['l1', 'l2'], 
         "learning_rate": ['optimal', 'invscaling', 'adaptive'],
        }, 
    "LGBMRegressor": 
        {"boosting_type":  ['gbdt', 'dart', 'goss'],
         "num_leaves": [512, 1024, 2048, 4096, 8192], # Important
         "max_depth": [10, 20, 30, 50, -1], # Important
         "min_data_in_leaf": [1, 10, 30, 50, 100], # Important
         "n_estimators": [50, 75, 100, 125, 150],
         "lambda_l1": [0, 0.1, 1],
         "lambda_l2": [0, 0.1, 1],
        },
    "XGBRegressor": 
        {"n_estimators": [50, 75, 100, 125, 150],
         "max_depth": [10, 20, 30, 50, 100, 150],
         "min_child_weight": [1, 2, 10],
         "gamma": [0, 0.1, 1],
         "subsample": [0.3, 0.5, 0.7, 0.9],
         "colsample_bytree": [0.3, 0.5, 0.7, 0.9],
         "colsample_bylevel": [0.3, 0.5, 0.7, 0.9],
         "colsample_bynode": [0.3, 0.5, 0.7, 0.9],
         "booster": ["gbtree", "gblinear", "dart"],
         "max_leaves": [512, 1024, 2048, 4096, 8192], # Important
        },
    "AdaBoostRegressor": 
        {"base_estimator": [None],
         "n_estimators": [50, 75, 100, 125, 150],
         "loss": ['linear', 'square', 'exponential'],
        },
    "RandomForestRegressor": 
        {"n_estimators": [50, 75, 100, 125, 150],
         "criterion": ["mse", "mae"],
         "max_depth": [10, 20, 30, 50, None],
         "max_features": ["auto", "sqrt", "log2"],
         "max_leaf_nodes": [31, 71, 131, 201],
        },
}

### tune-sklearn settings
from ray.tune.sklearn import TuneGridSearchCV, TuneSearchCV

tune_search = dict()
for model in model_list:
    print("model name:", model)

    tune_search_current = TuneSearchCV(
                    Regr_list[model](),
                    parameter_grid_list[model],
                    search_optimization="bayesian",
                    n_trials=3,
                    early_stopping=True,
                    max_iters=10,
                    verbose=1,
                    return_train_score=True,
                  )
    tune_search_current.fit(X_train_scaled, y_train_scaled)

    print("================ tune_search.best_params_ ===================")
    print(tune_search_current.best_params_)

    print("================ tune_search.best_score_ ===================")
    print(tune_search_current.best_score_)

    ### refit for multiple metrics
    tune_search[model] = tune_search_current

The error message:

model name: SGDRegressor
WARNING: The object store is using /tmp instead of /dev/shm because /dev/shm has only 62169088 bytes available. This may slow down performance! You may be able to free up space by deleting files in /dev/shm or terminating any running plasma_store_server processes. If you are inside a Docker container, you may need to pass an argument with the flag '--shm-size' to 'docker run'.
== Status ==
Memory usage on this node: 42.8/94.3 GiB
Using AsyncHyperBand: num_stopped=0
Bracket: Iter 4.000: None | Iter 1.000: None
Resources requested: 1/24 CPUs, 0/0 GPUs, 0.0/37.26 GiB heap, 0.0/12.84 GiB objects
Result logdir: /root/ray_results/_Trainable
Number of trials: 3 (2 PENDING, 1 RUNNING)
+---------------------+----------+-------+-----------------+--------------+-----------+
| Trial name          | status   | loc   | learning_rate   | loss         | penalty   |
|---------------------+----------+-------+-----------------+--------------+-----------|
| _Trainable_f4ccd2ee | RUNNING  |       | adaptive        | huber        | l2        |
| _Trainable_f4ce3530 | PENDING  |       | adaptive        | squared_loss | l1        |
| _Trainable_f4cede90 | PENDING  |       | optimal         | huber        | l1        |
+---------------------+----------+-------+-----------------+--------------+-----------+

Trial _Trainable_f4ccd2ee: Error processing event.
Traceback (most recent call last):
  File "/usr/local/envs/test/lib/python3.8/site-packages/ray/tune/trial_runner.py", line 515, in _process_trial
    result = self.trial_executor.fetch_result(trial)
  File "/usr/local/envs/test/lib/python3.8/site-packages/ray/tune/ray_trial_executor.py", line 488, in fetch_result
    result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT)
  File "/usr/local/envs/test/lib/python3.8/site-packages/ray/worker.py", line 1428, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): ray::_Trainable.train() (pid=5477, ip=172.17.0.3)
  File "python/ray/_raylet.pyx", line 484, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 438, in ray._raylet.execute_task.function_executor
  File "/usr/local/envs/test/lib/python3.8/site-packages/ray/tune/trainable.py", line 336, in train
    result = self.step()
  File "/usr/local/envs/test/lib/python3.8/site-packages/tune_sklearn/_trainable.py", line 119, in step
    return self._train()
  File "/usr/local/envs/test/lib/python3.8/site-packages/tune_sklearn/_trainable.py", line 178, in _train
    self._early_stopping_partial_fit(i, estimator, X_train,
  File "/usr/local/envs/test/lib/python3.8/site-packages/tune_sklearn/_trainable.py", line 125, in _early_stopping_partial_fit
    estimator.partial_fit(X_train, y_train, np.unique(self.y))
  File "/usr/local/envs/test/lib/python3.8/site-packages/sklearn/linear_model/_stochastic_gradient.py", line 1181, in partial_fit
    return self._partial_fit(X, y, self.alpha, C=1.0,
  File "/usr/local/envs/test/lib/python3.8/site-packages/sklearn/linear_model/_stochastic_gradient.py", line 1136, in _partial_fit
    sample_weight = _check_sample_weight(sample_weight, X)
  File "/usr/local/envs/test/lib/python3.8/site-packages/sklearn/utils/validation.py", line 1302, in _check_sample_weight
    raise ValueError("sample_weight.shape == {}, expected {}!"
ValueError: sample_weight.shape == (176,), expected (271,)!
== Status ==
Memory usage on this node: 42.9/94.3 GiB
Using AsyncHyperBand: num_stopped=0
Bracket: Iter 4.000: None | Iter 1.000: None
Resources requested: 0/24 CPUs, 0/0 GPUs, 0.0/37.26 GiB heap, 0.0/12.84 GiB objects
Result logdir: /root/ray_results/_Trainable
Number of trials: 3 (1 ERROR, 2 TERMINATED)
+---------------------+------------+-------+-----------------+--------------+-----------+
| Trial name          | status     | loc   | learning_rate   | loss         | penalty   |
|---------------------+------------+-------+-----------------+--------------+-----------|
| _Trainable_f4ccd2ee | ERROR      |       | adaptive        | huber        | l2        |
| _Trainable_f4ce3530 | TERMINATED |       | adaptive        | squared_loss | l1        |
| _Trainable_f4cede90 | TERMINATED |       | optimal         | huber        | l1        |
+---------------------+------------+-------+-----------------+--------------+-----------+
Number of errored trials: 1
+---------------------+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Trial name          |   # failures | error file                                                                                                                                                                                    |
|---------------------+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| _Trainable_f4ccd2ee |            1 | /root/ray_results/_Trainable/_Trainable_f4ccd2ee_1_X_id=ObjectRef(ffffffffffffffffffffffff0100000001000000),cv=KFold(n_splits=5, random_state=None, shuffle=Fal_2020-11-04_03-00-55/error.txt |
+---------------------+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

================ tune_search.best_params_ ===================
Traceback (most recent call last):
  File "Formulation.py", line 102, in <module>
    print(tune_search_current.best_params_)
  File "/usr/local/envs/test/lib/python3.8/site-packages/tune_sklearn/tune_basesearch.py", line 96, in best_params_
    return self.best_params
AttributeError: 'TuneSearchCV' object has no attribute 'best_params'
richardliaw commented 3 years ago

Hmm, @DingXiangYuanZhiXing can you try upgrading to the latest Ray and installing tune-sklearn from source?

I see this for my output:

model name: SGDRegressor
File descriptor limit 256 is too low for production servers and may result in connection errors. At least 8192 is recommended. --- Fix with 'ulimit -n 8192'
== Status ==
Memory usage on this node: 21.6/64.0 GiB
Using AsyncHyperBand: num_stopped=0
Bracket: Iter 4.000: None | Iter 1.000: None
Resources requested: 1/16 CPUs, 0/0 GPUs, 0.0/25.88 GiB heap, 0.0/8.89 GiB objects
Result logdir: /Users/rliaw/ray_results/_Trainable_2020-11-03_20-54-44
Number of trials: 1/3 (1 RUNNING)
+---------------------+----------+-------+-----------------+--------+-----------+
| Trial name          | status   | loc   | learning_rate   | loss   | penalty   |
|---------------------+----------+-------+-----------------+--------+-----------|
| _Trainable_db5202e8 | RUNNING  |       | invscaling      | huber  | l2        |
+---------------------+----------+-------+-----------------+--------+-----------+

== Status ==
Memory usage on this node: 21.7/64.0 GiB
Using AsyncHyperBand: num_stopped=0
Bracket: Iter 4.000: 0.6692511608476028 | Iter 1.000: 0.6176064000972494
Resources requested: 0/16 CPUs, 0/0 GPUs, 0.0/25.88 GiB heap, 0.0/8.89 GiB objects
Result logdir: /Users/rliaw/ray_results/_Trainable_2020-11-03_20-54-44
Number of trials: 3/3 (3 TERMINATED)
+---------------------+------------+-------+-----------------+--------------+-----------+--------+------------------+---------------------+---------------------+---------------------+
| Trial name          | status     | loc   | learning_rate   | loss         | penalty   |   iter |   total time (s) |   split0_test_score |   split1_test_score |   split2_test_score |
|---------------------+------------+-------+-----------------+--------------+-----------+--------+------------------+---------------------+---------------------+---------------------|
| _Trainable_db5202e8 | TERMINATED |       | invscaling      | huber        | l2        |     10 |        0.0598419 |            0.371315 |            0.6253   |            0.638369 |
| _Trainable_db56ea06 | TERMINATED |       | invscaling      | squared_loss | l1        |     10 |        0.0621631 |            0.557297 |            0.70506  |            0.761581 |
| _Trainable_db5966a0 | TERMINATED |       | invscaling      | squared_loss | l1        |     10 |        0.0597727 |            0.558324 |            0.703291 |            0.76552  |
+---------------------+------------+-------+-----------------+--------------+-----------+--------+------------------+---------------------+---------------------+---------------------+

================ tune_search.best_params_ ===================
{'loss': 'squared_loss', 'penalty': 'l1', 'learning_rate': 'invscaling'}
================ tune_search.best_score_ ===================
0.6796377323696179
dxyzx0 commented 3 years ago

@richardliaw Installing from source solves my problem. Is there a plan for new release? Current version tune-sklearn==0.1.0 is old.