pycaret / pycaret

An open-source, low-code machine learning library in Python
https://www.pycaret.org
MIT License
8.8k stars 1.76k forks source link

[BUG]: Run compare_models but always return empty list #3690

Open Haminh630 opened 1 year ago

Haminh630 commented 1 year ago

pycaret version checks

Issue Description

Hello everyone,

I always face the issue that after run compare_models() return the empty list

How can I help me to fix this bug?

Screenshot 2023-08-06 at 08 30 27 Screenshot 2023-08-06 at 08 30 44 Screenshot 2023-08-06 at 08 31 09

Reproducible Example

import numpy as np
import pandas as pd

train = pd.read_csv("train.csv")
test = pd.read_csv("test.csv")

valid_df = train[train['month'] == '2022-11'].reset_index(drop = True)
train_df = train[train['month'] != '2022-11'].reset_index(drop = True)

features = [col for col in train_df.columns if col not in ["userID_hash", "month", "y"]]

y = 'y'

# import the regression module**
from pycaret.regression import *

# initialize setup**
s = setup(data = train_df, test_data = valid_df, target = y, fold_strategy = 'timeseries', 
          numeric_features = features, fold = 3, transform_target = True, session_id = 123, index=False)

best = compare_models(sort = 'MAE', include = ['lr', 'en'])

finalzed_model = finalize_model(best)
preds = predict_model(finalzed_model, data=test)

Expected Behavior

I hope I know how can I get best model from compare_models() function

Actual Results

Output exceeds the size limit. Open the full output data in a text editor
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[30], line 1
----> 1 prediction_holdout = predict_model(best);

File ~/anaconda3/envs/time_series_dl/lib/python3.8/site-packages/pycaret/regression/functional.py:1903, in predict_model(estimator, data, round, verbose)
   1900 if experiment is None:
   1901     experiment = _EXPERIMENT_CLASS()
-> 1903 return experiment.predict_model(
   1904     estimator=estimator,
   1905     data=data,
   1906     round=round,
   1907     verbose=verbose,
   1908 )

File ~/anaconda3/envs/time_series_dl/lib/python3.8/site-packages/pycaret/regression/oop.py:2194, in RegressionExperiment.predict_model(self, estimator, data, round, verbose)
   2140 def predict_model(
   2141     self,
   2142     estimator,
   (...)
   2145     verbose: bool = True,
   2146 ) -> pd.DataFrame:
   2147     """
   2148     This function predicts ``Label`` using a trained model. When ``data`` is
   2149     None, it predicts label on the holdout set.
...
   4956 pred = pipeline.inverse_transform(pred)
   4957 # Need to convert labels back to numbers
   4958 # TODO optimize

AttributeError: 'list' object has no attribute 'predict'

Installed Versions

3.0.4
Umar-cs commented 1 year ago

Same err was getting StreamlitAPIException: ("Could not convert 'PassengerId' with type str: tried to convert to int64", 'Conversion failed for column Value with type object') solved this err using "C:\Users\your_user\anaconda3\envs\your_env_name\Lib\site-packages\streamlit\config.py" In the config.py locate dataFrameSerialization = "arrow" instead of "arrow" change to "legacy" --> save but the problem is that now getting compare_df output empty. need some help to resolve this issue.

if choice == "Modelling": chosen_target = st.selectbox('Choose the Target Column', df.columns) if st.button('Run Modelling'): setup(df, target=chosen_target, verbose = False) setup_df = pull() st.dataframe(setup_df) best_model = compare_models() compare_df = pull() st.info("model") st.dataframe(compare_df) best_model save_model(best_model, 'best_model') Screenshot 2023-08-06 160922

moezali1 commented 1 year ago

@Haminh630 compare_models return empty list when model training fails. If you set errors = 'raise' in compare_models you will be able to see what is causing the failure.

Also if you can share the train.csv and test.csv here so I can troubleshoot what is causing this. Thanks

mukeshvadapali72 commented 1 year ago

I am also getting the empty list and when used error = 'raise' I am getting this error - TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker.

moezali1 commented 1 year ago

@mukeshvadapali72 How big is your dataset? Can you also try this by passing n_jobs = 1 and let us know if that works?

Umar-cs commented 1 year ago

@moezali1 used errors = 'raise' in compare_models got this ValueError: n_splits=10 cannot be greater than the number of members in each class. Used Stratified K-Fold cross-validation, n_splits=5,and tried different datasets too but still got same err. Kindly guide me to resolve this error.

ngupta23 commented 1 year ago

Please google this error message. This is coming from scikit since you have too few classes for the number of folds. You can try reducing the number of folds to run.

image

huangliang0828 commented 1 year ago

I also meet the similar problem , not only Binary Classification Tutorial Level Beginner (中文) - CLF101.ipynb, but also for other .ipynb files all the .ipynb files are trusted. compared_models(fold=3) # The results of step automatically disappears around 3s (as shown in the figure) [lighGBM] [info]... [lighGBM] [info] ..., as if this step did not run, there is no result/output of compared_models() win10 64bit miniconda 2023 conda install python=3.8.16 pip install pycaret==2.3.10 pip check # Env No broken requirements found. image compared_models(fold=3) # the results automatically disappears.
image image Nearly all the .ipynb files got the similar results as below: image [LightGBM] [Warning] There are no meaningful features which satisfy the provided configuration. Decreasing Dataset parameters min_data_in_bin or min_data_in_leaf and re-constructing Dataset might resolve this warning. [LightGBM] [Info] Number of positive: 1, number of negative: 1 ...... In the file Binary Classification Tutorial Level Beginner (中文) - CLF101.ipynb setup() was modified below: exp_clf101 = setup(data = data, target = 'default', session_id=123, use_gpu=True, n_jobs = 1) # use_gpu=True or False I tried reinstalled the pycaret 2.3.10 environment,all the outputs/results were no changes. @moezali1 @ngupta23

huangliang0828 commented 1 year ago

It may be numpy pandas and scipy or setuptools packages version problem. Solve the problem when reinstall numpy =1.21 “scipy < 1.9” pandas = 1.5.3 setuptools = 64 Refer to https://github.com/pycaret/pycaret/issues/3503

But the [lighGBM] [info]... [lighGBM] [info] ... were still showed and/or disappered,partly like below: [LightGBM] [Warning] Stopped training because there are no more leaves that meet the split requirements [LightGBM] [Warning] Stopped training because there are no more leaves that meet the split requirements [LightGBM] [Warning] Stopped training because there are no more leaves that meet the split requirements [LightGBM] [Warning] Stopped training because there are no more leaves that meet the split requirements [LightGBM] [Warning] Stopped training because there are no more leaves that meet the split requirements I do not know the detailed prompt information.

moezali1 commented 11 months ago

@huangliang0828 Can you please try this with PyCaret >3.0 and let us know if you are still facing the issue.

ashleydavyUHBW commented 9 months ago

@mukeshvadapali72 How big is your dataset? Can you also try this by passing n_jobs = 1 and let us know if that works?

I got the same error as above user. I'm trying to get time-series tutorial to run, with the test/tutorial data ('airline') only having 144 rows. With my original data (not tutorial data, but again only 597 long, with 1 column) it would take about 5 minutes to run, but then would be pointless as nothing would return. I added in n_jobs = 1 in the setup and added include = ['auto_arima','exp_smooth'] which then did run, but I didn't get the output of a grid/table with various columns showing data accuracy etc.

If I used the OOP api then the result returned: image

If I do the functional api, then the n_jobs=1 has at least made it work and best is not empty, it contains type ExponentialSmoothing. However the grid doesn't display comparing the models image

However when following the tutorial for classification using diabetes dataset, the below output grid did generate as expected, image

When the Time-series is running I do get a visible output for the first chunk of dots: image But then when finished, it all clears and never shows the grid, unlike when I ran it with the classification.

I'm new to all of this, and installed pyCaret into a new venv this morning, so definitely on latest version of PyCaret! (Just checked and it's 3.0.4)

ashleydavyUHBW commented 9 months ago

tl;dr - I'm using Visual Studio Code and clicking "clear all outputs" before running currently seems to largely fix the issue.

After looking at this last night, I had left code running, and it had frozen on 20% for a long time, decided to terminate after maybe 30 minutes Again this morning, ran some code including just a few estimators and it worked! I tried adding in and taking out some and thought that adding in polytrend made it work, but this turned out to be a red-herring (though I think I've successfully found that 'auto_arima' causes it to hang (possibly) indefinitely. Maybe it is just a bit slower to run (like all the '_cds_dt' models).

Ultimately I discovered adding in more or less or different models (using include = []) initially appeared to make a difference, but it didn't. Clicking "Clear all outputs" worked!

Interestingly, now when running (with exclude = ['auto_arima]), the first 19 models appear in the table, with the progress bar being on 97% before it then disappeared and has continued working on the the remaining models ['rf_cds_dt', 'et_cds_dt', 'gbr_cds_dt', 'ada_cds_dt','lightgbm_cds_dt'], but without the progress bar. However the top box still appeared saying the status and estimator, so I could see when it was nearing the end due to having pasted the list of models().index in another cell.

tpduarte commented 8 months ago

@huangliang0828 Can you please try this with PyCaret >3.0 and let us know if you are still facing the issue.

I had a very similar error and installing pycaret 3.1.0 version solved it.

ashleydavyUHBW commented 8 months ago

@huangliang0828 Can you please try this with PyCaret >3.0 and let us know if you are still facing the issue.

I had a very similar error and installing pycaret 3.1.0 version solved it.

As I'm relatively new to python generally - do you have a suggestion of how best to do this? I guess also ensuring that the version if of PyCaret I update is in the correct place?

I have Python 3.7.9, and tried installing pyCaret but it failed to import, which lead me to learn through StackOverflow that it's best to create a virtual environment and then have PyCaret be the first thing that is installed, so it has all the correct dependencies.

I did that, but as I had not created a venv before and didn't know that giving it a name with spaces in would create multiple venv's, I've ended up with the venv with PyCaret installed into being called "by" image

I've now looked and seen that this site - https://pypi.org/project/pycaret/ - says python >3.8 is tested, so thought I should start again in a venv not based on 3.7 although I note it worked in 3.7 and that same site shows 3.7 at the top: image

To summarize/restate my question simply: What's the quickest/best way to update/uninstall PyCaret? Currently installing it into my new 3.10 venv has taken 13 minutes and counting😩