kdgutier / esrnn_torch

MIT License
156 stars 44 forks source link

Very short time series in unbalanced panel, assert n_windows>0 #24

Closed diogoalvesderesende closed 3 years ago

diogoalvesderesende commented 3 years ago

Hey, I am getting the following error. Do you think you can help? Thanks in advance!

model.fit(X_df, y_df) Infered frequency: D =============== Training ESRNN ===============

Traceback (most recent call last):

File "", line 1, in model.fit(X_df, y_df)

File "C:\Users\diogo\AppData\Roaming\Python\Python37\site-packages\ESRNN\ESRNN.py", line 368, in fit warm_start=warm_start, shuffle=shuffle, verbose=verbose)

File "C:\Users\diogo\AppData\Roaming\Python\Python37\site-packages\ESRNN\ESRNN.py", line 186, in train windows_y, windows_y_hat, levels = self.esrnn(batch)

File "C:\Users\diogo\AppData\Roaming\Python\Python37\site-packages\torch\nn\modules\module.py", line 727, in _call_impl result = self.forward(*input, **kwargs)

File "C:\Users\diogo\AppData\Roaming\Python\Python37\site-packages\ESRNN\utils\ESRNN.py", line 273, in forward windows_y_hat, windows_y, levels, seasonalities = self.es(ts_object)

File "C:\Users\diogo\AppData\Roaming\Python\Python37\site-packages\torch\nn\modules\module.py", line 727, in _call_impl result = self.forward(*input, **kwargs)

File "C:\Users\diogo\AppData\Roaming\Python\Python37\site-packages\ESRNN\utils\ESRNN.py", line 54, in forward assert n_windows>0

AssertionError

kdgutier commented 3 years ago

Hi Diogo, It seems that the number of windows available for the model are zero, when that happened to me before it was that the output_size was too big for a series of your data, try to make sure that the series are long enough. Hope this helps.

diogoalvesderesende commented 3 years ago

Thanks for the prompt reply. The data has ~2.5k observations, around 7 years. I am sending my data on this link and the code is below. Thank you for the support!

`# Python
import pandas as pd

#get data
data = pd.read_csv("DHS_Daily_Report.csv")

#transforming date variable
data["Date"] = pd.to_datetime(data["Date"], format = "%m/%d/%Y")
data["Date"] = data["Date"].dt.strftime("%Y-%m-%d")

#rename forecasting variable
data = data.rename(columns = {'Total Individuals in Shelter' : 'y'})
data = data.rename(columns = {'Date' : 'ds'})

#selecting variables
#selecting variables
dataset = data.loc[data["ds"] <= "2020-11-11", ["ds","y", "Easter",
                                   "Thanksgiving", "Christmas"]]
future = data.loc[data["ds"] >= "2020-11-12", ["ds","y", "Easter",
                                  "Thanksgiving", "Christmas"]]

#adding unique IDs
dataset.insert(0, 'unique_id', dataset.index)

#isolate Y and X
y_df = dataset.loc[:, ["unique_id", "ds", "y"]]
X_df = dataset.loc[:, ["unique_id", "ds", "Easter"]]
X_df = X_df.rename(columns = {'Easter' : 'x'})

#forecasting model
from ESRNN import ESRNN
model = ESRNN(max_epochs=25, 
              freq_of_test=5, 
              batch_size=4, learning_rate=1e-4,
              per_series_lr_multip=0.8, lr_scheduler_step_size=10,
              lr_decay=0.1, gradient_clipping_threshold=50,
              rnn_weight_decay=0.0, level_variability_penalty=100,
              testing_percentile=50, training_percentile=50,
              ensemble=False, max_periods=25, seasonality=[],
              input_size=4, output_size=6,
              cell_type='LSTM', state_hsize=40,
              dilations=[[1], [6]], add_nl_layer=False,
              random_seed=1, device='cpu')
model.fit(X_df, y_df)
kdgutier commented 3 years ago

The input_size and output_size might be just too big for a series in your panel to handle. Check if your panel data is balanced and try to either avoid very short series or pad them. Another thing I advice is to look at the package hyperopt (or similar) to tune the hyperparameters of the ESRNN, doing manually is time consuming.