rapidsai / cuml

cuML - RAPIDS Machine Learning Library
https://docs.rapids.ai/api/cuml/stable/
Apache License 2.0
4.24k stars 532 forks source link

AttributeError: 'NoneType' object has no attribute 'free_treelite_model' #4091

Open pseudotensor opened 3 years ago

pseudotensor commented 3 years ago

@hcho3 when using new cuml from nightly, after fitting/predicting, I hit this at end when python exits.

Seems there is some assumption that something exists in del when doesn't always. E.g. I never dumped a treelite model

just doing this with any X, y:

from cuml.ensemble import RandomForestClassifier
model = RandomForestClassifier()
model.fit(X, y)
self.model.predict_proba(X)

ends with ignored exception, but still ugly and good to fix.

Exception ignored in: <object repr() failed>
Traceback (most recent call last):
  File "cuml/ensemble/randomforestclassifier.pyx", line 328, in cuml.ensemble.randomforestclassifier.RandomForestClassifier.__del__
  File "cuml/ensemble/randomforestclassifier.pyx", line 344, in cuml.ensemble.randomforestclassifier.RandomForestClassifier._reset_forest_data
AttributeError: 'NoneType' object has no attribute 'free_treelite_model'
hcho3 commented 3 years ago

@pseudotensor I cannot reproduce the problem on my end. I ran this script

import numpy as np
from cuml.ensemble import RandomForestClassifier
from cuml.datasets import make_classification

X, y = make_classification(n_samples=1000, n_features=20)
X = X.astype(np.float32)

model = RandomForestClassifier()
model.fit(X, y)
print(model.predict_proba(X))

and no warning is displayed.

pseudotensor commented 3 years ago

Must be something to do with pickling the model and re-using. If I get repro can share, but I assume you might imagine how can happen.

hcho3 commented 3 years ago

The warning is thrown from https://github.com/rapidsai/cuml/blob/5b36ced22a70ee86c3153311efdc0cd7f0272101/python/cuml/ensemble/randomforestclassifier.pyx#L343-L344

This line of code is calling a static class method of TreeliteModel class, and somehow the class has become None at the time of exit.

It would be good to have a repro, since I'd like to validate my proposed fix against it.

pseudotensor commented 3 years ago

Maybe it's race, will try to repro later.

wphicks commented 3 years ago

This consistently occurs in tests on Triton/FIL as well. @hcho3 I believe you should be able to repro this by running the Triton/FIL CI script locally.

github-actions[bot] commented 2 years ago

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

github-actions[bot] commented 2 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

nikhartz commented 1 year ago

Some Updates how to fix this error ? I am having the same problem.

mfedenia commented 1 year ago

Here is code that generates the error described above. Any news on a fix?

%%

pip install cudf-cu11 --extra-index-url=https://pypi.nvidia.com

pip install cuml-cu11 --extra-index-url=https://pypi.nvidia.com

pip install scikit-learn

%%

import cudf import cuml import pandas as pd from cuml.ensemble import RandomForestClassifier from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split

%%

Dummy function to simulate independent datasets

def create_independent_datasets(X, y, nsplits=3): datasets = [] for in range(n_splits): X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) datasets.append((X_train, X_test, y_train, y_test)) return datasets

Function to train a random forest model using cuML and GPU

def train_rf_on_gpu(dataset): X_train, X_test, y_train, y_test = dataset X_train_cudf = cudf.DataFrame.from_pandas(pd.DataFrame(X_train)) y_train_cudf = cudf.Series(y_train) rf = RandomForestClassifier(n_estimators=100, random_state=42,n_streams=1) rf.fit(X_train_cudf, y_train_cudf) return rf

%%

Load iris dataset

data = load_iris() X, y = data.data, data.target

%%

Create independent datasets

independent_datasets = create_independent_datasets(X, y)

%%

Train random forest models sequentially on a single GPU

trained_models = [train_rf_on_gpu(dataset) for dataset in independent_datasets]

%%

from sklearn.metrics import accuracy_score

Evaluate the accuracy of each trained model on the test set

accuracies = [] for i, model in enumerate(trained_models): X_test = cudf.DataFrame.from_pandas(pd.DataFrame(independent_datasets[i][1])) y_test = independent_datasets[i][3] y_pred = model.predict(X_test) acc = accuracy_score(y_test, y_pred.to_pandas()) accuracies.append(acc) print(f"Accuracy of model {i}: {acc}")

ylevental commented 11 months ago

I am having this problem too

stonkpunk commented 1 month ago

issue still exists -- but can be fixed by adding model = None after doing predictions following the suggestion from this article