jdb78 / pytorch-forecasting

Time series forecasting with PyTorch
https://pytorch-forecasting.readthedocs.io/
MIT License
3.77k stars 600 forks source link

'idx' ISSUE on best_tft.plot_prediction idx not found #1155

Open RicardoDeGouveia opened 1 year ago

RicardoDeGouveia commented 1 year ago

The thing is, I know there are 886 time series in my data.

Let me explain, I am using TFT to predict demands for articles in general and not by "agency _XX", that is, I am using the time series only of the SKUs of each product and in total I have 886 SKUs, which translates into 886 time series. What i am doing wrong?

NOTE: All of my 886 time series (SKUs) have at least 12 months of data viewed as points.

Set up of time series Data Set

max_prediction_length = 6 # forecast 6 months max_encoder_length = 12 # use 12 months of history training_cutoff = df_pruebas["time_idx"].max() - max_prediction_length training = TimeSeriesDataSet( df_pruebas[lambda x: x.time_idx <= training_cutoff], group_ids=["series"], target="quantity", time_idx="time_idx", max_encoder_length=max_encoder_length, max_prediction_length=max_prediction_length, static_categoricals=["series"], time_varying_known_categoricals=["month"], time_varying_unknown_reals=['quantity'], time_varying_known_reals=["time_idx"], target_normalizer=GroupNormalizer( groups=["series"], transformation="softplus" ), # use softplus and normalize by group allow_missing_timesteps=True,

categorical_encoders={'vendedor': NaNLabelEncoder(add_nan=True),"default_code": NaNLabelEncoder(add_nan=True)},

#categorical_encoders={"series": NaNLabelEncoder(add_nan=True)},
add_relative_time_idx=True, # add as feature
add_target_scales=True,     # add as feature
add_encoder_length=True,    # add as feature

)

create validation set (predict=True) which means to predict the last max_prediction_length points in time

for each series

validation = TimeSeriesDataSet.from_dataset(training, df_pruebas, predict=True, stop_randomization=True, allow_missing_timesteps=True,)# categorical_encoders={"default_code": NaNLabelEncoder(add_nan=True)})

create dataloaders for model

batch_size = 128 # set this between 32 to 128 train_dataloader = training.to_dataloader(train=True, batch_size=batch_size, num_workers=0) val_dataloader = validation.to_dataloader(train=False, batch_size=batch_size * 10, num_workers=0)

MY ERROR IS:


IndexError Traceback (most recent call last) Input In [194], in <cell line: 3>() 1 new_raw_predictions, new_x = best_tft.predict(new_prediction_data, mode="raw", return_x=True) 3 for idx in range(886,887): # plot 10 examples ----> 4 best_tft.plot_prediction(new_x, new_raw_predictions, idx=idx, show_future_observed=False)

File ~.conda\envs\NeuralNetwork\lib\site-packages\pytorch_forecasting\models\temporal_fusion_transformer__init__.py:711, in TemporalFusionTransformer.plot_prediction(self, x, out, idx, plot_attention, add_loss_to_title, show_future_observed, ax, kwargs) 694 """ 695 Plot actuals vs prediction and attention 696 (...) 707 plt.Figure: matplotlib figure 708 """ 710 # plot prediction as normal --> 711 fig = super().plot_prediction( 712 x, 713 out, 714 idx=idx, 715 add_loss_to_title=add_loss_to_title, 716 show_future_observed=show_future_observed, 717 ax=ax, 718 kwargs, 719 ) 721 # add attention on secondary axis 722 if plot_attention:

File ~.conda\envs\NeuralNetwork\lib\site-packages\pytorch_forecasting\models\base_model.py:780, in BaseModel.plot_prediction(self, x, out, idx, add_loss_to_title, show_future_observed, ax, quantiles_kwargs, prediction_kwargs) 775 figs = [] 776 for y_raw, y_hat, y_quantile, encoder_target, decoder_target in zip( 777 y_raws, y_hats, y_quantiles, encoder_targets, decoder_targets 778 ): --> 780 y_all = torch.cat([encoder_target[idx], decoder_target[idx]]) 781 max_encoder_length = x["encoder_lengths"].max() 782 y = torch.cat( 783 ( 784 y_all[: x["encoder_lengths"][idx]], 785 y_all[max_encoder_length : (max_encoder_length + x["decoder_lengths"][idx])], 786 ), 787 )

IndexError: index 886 is out of bounds for dimension 0 with size 445

Code to reproduce the problem

new_raw_predictions, new_x = best_tft.predict(new_prediction_data, mode="raw", return_x=True)

for idx in range(886, 887):  # plot 10 examples
    best_tft.plot_prediction(new_x, new_raw_predictions, idx=idx, show_future_observed=False);

The thing is, I know there are 886 time series in my data.

Let me explain, I am using TFT to predict demands for articles in general and not by "agency _XX", that is, I am using the time series only of the SKUs of each product and in total I have 886 SKUs, which translates into 886 time series that am i doing wrong?

sairamtvv commented 1 year ago

I think there is an easier way to check, new_pred, new_x, new_index = best_tft.predict(val_dataloader, mode="prediction", return_x=True, return_index=True)

then check new_index. Each of the index is a time series. You can check which has been omitted and included from the new_index