ray-project / tune-sklearn

A drop-in replacement for Scikit-Learn’s GridSearchCV / RandomizedSearchCV -- but with cutting edge hyperparameter tuning techniques.
https://docs.ray.io/en/master/tune/api_docs/sklearn.html
Apache License 2.0
467 stars 51 forks source link

[BUG] ray:IDLE processes persist even after client.disconnect() #234

Open nopanderer opened 2 years ago

nopanderer commented 2 years ago

I tried to run the example code below using ray cluster.

import ray
# from sklearn.model_selection import GridSearchCV
from tune_sklearn import TuneGridSearchCV

# Other imports
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import SGDClassifier

client = ray.init("ray://MYRAYCLUSTER")

# Set training and validation sets
X, y = make_classification(n_samples=11000, n_features=1000, n_informative=50, n_redundant=0, n_classes=10, class_sep=2.5)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=1000)

# Example parameters to tune from SGDClassifier
parameters = {
    'alpha': [1e-4, 1e-1, 1],
    'epsilon':[0.01, 0.1]
}

tune_search = TuneGridSearchCV(
    SGDClassifier(),
    parameters,
    early_stopping="MedianStoppingRule",
    max_iters=10
)

import time # Just to compare fit times
start = time.time()
tune_search.fit(X_train, y_train)
end = time.time()
print("Tune Fit Time:", end - start)
pred = tune_search.predict(X_test)
accuracy = np.count_nonzero(np.array(pred) == np.array(y_test)) / len(pred)
print("Tune Accuracy:", accuracy)

client.disconnect()

Even after I disconnected the client, there are ray:IDLE processes in the ray head node. I tried other examples the Ray Core and Ray Tune and this issue not happened.

Yard1 commented 2 years ago

Can you update Ray to the latest version and try again?

nopanderer commented 2 years ago

Thanks, Yard1. I upgraded Ray to 1.10.0 and tried again, but still happens. When I run ray memory, the processes below persist.

IP_ADDRESS | PID | Worker | (deserialize task arg) ray.tune.tune.run | xxxxxxxx.x B | LOCAL_REFERENCE | OBJECT_REF
IP_ADDRESS | PID | Worker | (deserialize task arg) ray.tune.tune.run | xxxxxxxx.x B | LOCAL_REFERENCE | OBJECT_REF
IP_ADDRESS | PID | Worker | (deserialize task arg) ray.tune.tune.run | xxxxxxxx.x B | LOCAL_REFERENCE | OBJECT_REF
...
Yard1 commented 2 years ago

Ok, I'll take a look. Thanks!

Yard1 commented 2 years ago

Hey @nopanderer this should be fixed in https://github.com/ray-project/tune-sklearn/releases/tag/v0.4.2, please let me know if the problem persists after update or not.

nopanderer commented 2 years ago

@Yard1 I'll check it out. Thanks a lot!

skabbit commented 2 years ago

Got the same issue on v0.4.3. After running TuneGridSearchCV head has multiple processes with IDLE status. Is there any option to kill these processes from Python?