pycaret / pycaret

An open-source, low-code machine learning library in Python
https://www.pycaret.org
MIT License
8.82k stars 1.76k forks source link

[BUG]: Predict with External Regressors #3602

Open tobiassiegfried opened 1 year ago

tobiassiegfried commented 1 year ago

pycaret version checks

Issue Description

I am developing a simple forecasting experiment for the prediction of monthly discharge 6 months ahead. I have training and testing data available with which I can successfully train, tune and finalise a PyCaret time series model experiment. Prediction requires external regressors (xreg). I use a xreg dataframe with 6 time stamps to predict 6 months into the future.

Issue 1: I can predict as many time steps as I want into the future even though my xreg dataset is only available for the next 6 time stamps. For example, I can pass a reg key that is 1 long but I can forecast 60 time steps ahead.

Issue 2: Predictions are always the same, independent what xreg key is passed.

Data and code are available under this link via GitHub or here via Colab.

Reproducible Example

# Please see Colab Code on Github

Expected Behavior

Issue 1: If I pass x_reg of length t, I expect the prediction to fail if my chosen forecasting horizon fh>t.

Issue 2: Predictions actually depend on xregs passed.

Actual Results

# Please see Colab Notebook.

Installed Versions

System: python: 3.10.12 (main, Jun 7 2023, 12:45:35) [GCC 9.4.0] executable: /usr/bin/python3 machine: Linux-5.15.107+-x86_64-with-glibc2.31 PyCaret required dependencies: pip: 23.1.2 setuptools: 67.7.2 pycaret: 3.0.2 IPython: 7.34.0 ipywidgets: 7.7.1 tqdm: 4.65.0 numpy: 1.22.4 pandas: 1.5.3 jinja2: 3.1.2 scipy: 1.10.1 joblib: 1.2.0 sklearn: 1.2.2 pyod: 1.0.9 imblearn: 0.10.1 category_encoders: 2.6.1 lightgbm: 3.3.5 numba: 0.56.4 requests: 2.27.1 matplotlib: 3.7.1 scikitplot: 0.3.7 yellowbrick: 1.5 plotly: 5.13.1 kaleido: 0.2.1 statsmodels: 0.13.5 sktime: 0.17.0 tbats: 1.1.3 pmdarima: 2.0.3 psutil: 5.9.5 PyCaret optional dependencies: shap: 0.41.0 interpret: 0.4.2 umap: 0.5.3 pandas_profiling: 3.6.6 explainerdashboard: 0.4.2.2 autoviz: 0.1.720 fairlearn: 0.7.0 xgboost: 1.7.5 catboost: 1.2 kmodes: 0.12.2 mlxtend: 0.22.0 statsforecast: 1.5.0 tune_sklearn: 0.4.5 ray: 2.5.0 hyperopt: 0.2.7 optuna: 3.2.0 skopt: 0.9.0 mlflow: 1.30.1 gradio: 3.34.0 fastapi: 0.97.0 uvicorn: 0.22.0 m2cgen: 0.10.0 evidently: 0.2.8 fugue: 0.8.5 streamlit: Not installed prophet: 1.1.3
ngupta23 commented 1 year ago

Hi, The dataset seems to be private and is not accessible. Can you make it public so I can try to reproduce the problem?

image

tobiassiegfried commented 1 year ago

Thanks Nikil - changed the visibility of the rep to public. The file should be accessible now (hopefully).

ngupta23 commented 1 year ago

I was able to reproduce this with base sktime as well. I have opened an issue with them. Based on their response, I will take appropriate actions in pycaret.

https://github.com/sktime/sktime/issues/5127