ourownstory / neural_prophet

NeuralProphet: A simple forecasting package
https://neuralprophet.com
MIT License
3.88k stars 479 forks source link

Some problems in make_future_dataframe #1613

Open potoyeee opened 4 months ago

potoyeee commented 4 months ago

Hello, I have encountered an error and would like you to answer it. In my exp, set n_lags=96, add some lagged_regressor, and add some future_regressor. In predict, set n_forecasts=10 (to predict next 10 steps). future = m.make_future_dataframe( df_train, regressors_df=future_wh, periods=10, n_historic_predictions=False, ) forecast = m.predict(future) But there was a dimensional error “RuntimeError: The size of tensor a (2) must match the size of tensor b (3) at non-singleton dimension 3” If I add 1 or 2 future regressor,It can predict next 10 steps. But add other numbers future regressor, will have the dimensional error

potoyeee commented 4 months ago

It seems to be related to regressors_df, when I delete some columns, I can predict normally。

ourownstory commented 3 months ago

This does sound like a bug, or possibly a corner case that should fail with a more explicit error message.

@potoyeee Please help us by sharing a minimal piece of code that we need to reproduce the error. Thank you!

keklol5050 commented 3 months ago

@ourownstory Hi! I have same error with future regressors, and other but similar error with past exogs. here is the basic example:

from neuralprophet import NeuralProphet, set_log_level
import pandas as pd
import numpy as np

set_log_level("ERROR")

import logging

logging.getLogger().setLevel(logging.ERROR)
#%%
start_date = '2022-02-22'
freq = '4h'

date_range = pd.date_range(start=start_date, periods=18063, freq=freq)

random_y = np.random.rand(len(date_range))
random_exog1 = np.random.rand(len(date_range))
random_exog2 = np.random.rand(len(date_range))
random_exog3 = np.random.rand(len(date_range))
random_exog4 = np.random.rand(len(date_range))
random_exog5 = np.random.rand(len(date_range))
random_exog6 = np.random.rand(len(date_range))
random_exog7 = np.random.rand(len(date_range))
random_exog8 = np.random.rand(len(date_range))
random_exog9 = np.random.rand(len(date_range))
random_exog10 = np.random.rand(len(date_range))
random_exog11 = np.random.rand(len(date_range))

data = pd.DataFrame({'ds': date_range,
                     'y': random_y,
                     'exog1': random_exog1,
                     'exog2': random_exog2,
                     'exog3': random_exog3,
                     'exog4': random_exog4,
                     'exog5': random_exog5,
                     'exog6': random_exog6,
                     'exog7': random_exog7,
                     'exog8': random_exog8,
                     'exog9': random_exog9,
                     'exog10': random_exog10,
                     'exog11': random_exog11, })

data['month'] = data.ds.dt.month
data['quarter_of_year'] = data.ds.dt.quarter
data['day_of_week'] = data.ds.dt.dayofweek
data["is_weekend"] = (data.ds.dt.dayofweek >= 5).astype(int)

data
#%%
histr_columns = ['exog1', 'exog2', 'exog3', 'exog4', 'exog5', 'exog6', 'exog7', 'exog8', 'exog9', 'exog10', 'exog11']
futr_columns = ['month', 'quarter_of_year', 'day_of_week', 'is_weekend']
#%%
df = data[:-500]
df
#%%
forecast_horizon = 8

m = NeuralProphet(
    n_lags=7 * forecast_horizon,
    n_forecasts=forecast_horizon,
    ar_layers=[512, 512, 512, 512, ],
    learning_rate=0.003,
    epochs=1,
)

for column in histr_columns:
    m.add_lagged_regressor(column, n_lags=7 * forecast_horizon)

for column in futr_columns:
    m.add_future_regressor(column)

metrics_train = m.fit(df=df)
#%%
idx = -600
#%%
input_df = m.make_future_dataframe(data[:idx], regressors_df=data[futr_columns][idx:idx + forecast_horizon],
                                   periods=forecast_horizon)
true_df = data[idx:idx + forecast_horizon]
predicted_df = m.predict(input_df, raw=True, decompose=False)
#%%
keklol5050 commented 3 months ago

@ourownstory 1)problem only with 1h, 4h or higher frequency when rows are more than 5-10k. with <1h everything is ok for example if i have 4691 rows with freq 4h i must do data=data[700:] , but when i have >100k 15m rows everything is ok.

2)also in some situations I get the error 'mat1 and mat2 shapes cannot be multiplied' when using more than 1 past covariate (add_lagged_regressor), for example when i have 3243 rows and freq 4h and n_lags=7 * forecast_horizon, it always depends on the specific value of the number of rows and the forecasting horizon

3)strange things often happen - in the same situation as 2, everything is fine until you make a prediction on the data in which there is at least one point from the training set, then the model breaks down and gives an error 'mat1 and mat2 shapes cannot be multiplied' in any case

to reproduce 2 and 3 u should make 30-40 exogs and delete future regressors