Open pseudotensor opened 3 years ago
@pseudotensor I cannot reproduce the problem on my end. I ran this script
import numpy as np
from cuml.ensemble import RandomForestClassifier
from cuml.datasets import make_classification
X, y = make_classification(n_samples=1000, n_features=20)
X = X.astype(np.float32)
model = RandomForestClassifier()
model.fit(X, y)
print(model.predict_proba(X))
and no warning is displayed.
Must be something to do with pickling the model and re-using. If I get repro can share, but I assume you might imagine how can happen.
The warning is thrown from https://github.com/rapidsai/cuml/blob/5b36ced22a70ee86c3153311efdc0cd7f0272101/python/cuml/ensemble/randomforestclassifier.pyx#L343-L344
This line of code is calling a static class method of TreeliteModel
class, and somehow the class has become None
at the time of exit.
It would be good to have a repro, since I'd like to validate my proposed fix against it.
Maybe it's race, will try to repro later.
This consistently occurs in tests on Triton/FIL as well. @hcho3 I believe you should be able to repro this by running the Triton/FIL CI script locally.
This issue has been labeled inactive-90d
due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.
Some Updates how to fix this error ? I am having the same problem.
Here is code that generates the error described above. Any news on a fix?
import cudf import cuml import pandas as pd from cuml.ensemble import RandomForestClassifier from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split
def create_independent_datasets(X, y, nsplits=3): datasets = [] for in range(n_splits): X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) datasets.append((X_train, X_test, y_train, y_test)) return datasets
def train_rf_on_gpu(dataset): X_train, X_test, y_train, y_test = dataset X_train_cudf = cudf.DataFrame.from_pandas(pd.DataFrame(X_train)) y_train_cudf = cudf.Series(y_train) rf = RandomForestClassifier(n_estimators=100, random_state=42,n_streams=1) rf.fit(X_train_cudf, y_train_cudf) return rf
data = load_iris() X, y = data.data, data.target
independent_datasets = create_independent_datasets(X, y)
trained_models = [train_rf_on_gpu(dataset) for dataset in independent_datasets]
from sklearn.metrics import accuracy_score
accuracies = [] for i, model in enumerate(trained_models): X_test = cudf.DataFrame.from_pandas(pd.DataFrame(independent_datasets[i][1])) y_test = independent_datasets[i][3] y_pred = model.predict(X_test) acc = accuracy_score(y_test, y_pred.to_pandas()) accuracies.append(acc) print(f"Accuracy of model {i}: {acc}")
I am having this problem too
issue still exists
-- but can be fixed by adding model = None
after doing predictions following the suggestion from this article
@hcho3 when using new cuml from nightly, after fitting/predicting, I hit this at end when python exits.
Seems there is some assumption that something exists in del when doesn't always. E.g. I never dumped a treelite model
just doing this with any X, y:
ends with ignored exception, but still ugly and good to fix.