Massive memory usage running the LazyClassifier

shankarpandala / lazypredict

Lazy Predict help build a lot of basic models without much code and helps understand which models works better without any parameter tuning

MIT License

3.02k stars 344 forks source link

Massive memory usage running the LazyClassifier #327

Open qemtek opened 3 years ago

qemtek commented 3 years ago

Describe the bug Using a dataset with 500k rows and 27 features, I ran into a huge memory issue on iteration 12/30. Screenshot included so you can see how much memory was being used.

Screenshot 2021-02-08 at 13 02 49

Desktop (please complete the following information):

OS: OSX Catalina 10.15.5

Additional context Other packages installed

awswrangler==2.4.0 pandas==1.2.1 numpy==1.20.0 scikit-learn==0.23.1 sqlalchemy==1.3.23 psycopg2-binary==2.8.6 lazypredict==0.2.7 tqdm==4.56.0 xgboost==1.3.3 lightgbm==3.1.1 pytest==6.2.2 imblearn shap==0.38.1 matplotlib==3.3.4 ipython

apostolides commented 3 years ago

Hello,

I have the same issue using a train dataset with 125K rows. I'm training the models on google colaboratory with12G ram available. Runtime crashes on 38% prompting a huge amount of allocated memory. Did you find any workarounds for this issue?

Thanks in advance.

felixvor commented 1 year ago

A workaround is to filter out high memory model architectures from the default regressors / classifiers list and to pass that custom list of models to the LazyRegressor / LazyClassifier. For example:

import lazypredict
from lazypredict.Supervised import LazyRegressor

highmem_regressors = [
    "GammaRegressor", "GaussianProcessRegressor", "KernelRidge", "QuantileRegressor"
]
regressors = [reg for reg in lazypredict.Supervised.REGRESSORS if reg[0] not in highmem_regressors]
reg = LazyRegressor(regressors=regressors, verbose=1, ignore_warnings=True, custom_metric=None)
models, predictions = reg.fit(X_train, X_test, y_train, y_test)

dvijkalsi commented 1 year ago

This worked for me, I was using Google collab 8GB RAM

highmem_classifiers = ["LabelSpreading","LabelPropagation","BernoulliNB","KNeighborsClassifier", "ElasticNetClassifier", "GradientBoostingClassifier", "HistGradientBoostingClassifier"]

# Remove the high memory classifiers from the list
classifiers = [c for c in lazypredict.Supervised.CLASSIFIERS if c[0] not in highmem_classifiers]

clf = LazyClassifier(classifiers=classifiers, verbose=1, ignore_warnings=True, custom_metric=None)
models, predictions = clf.fit(X_train, X_test, y_train, y_test)
model_dictionary = clf.provide_models(X_train, X_test, y_train, y_test)
models