Open RicardoDeGouveia opened 1 year ago
I think there is an easier way to check, new_pred, new_x, new_index = best_tft.predict(val_dataloader, mode="prediction", return_x=True, return_index=True)
then check new_index. Each of the index is a time series. You can check which has been omitted and included from the new_index
The thing is, I know there are 886 time series in my data.
Let me explain, I am using TFT to predict demands for articles in general and not by "agency _XX", that is, I am using the time series only of the SKUs of each product and in total I have 886 SKUs, which translates into 886 time series. What i am doing wrong?
NOTE: All of my 886 time series (SKUs) have at least 12 months of data viewed as points.
Set up of time series Data Set
max_prediction_length = 6 # forecast 6 months max_encoder_length = 12 # use 12 months of history training_cutoff = df_pruebas["time_idx"].max() - max_prediction_length training = TimeSeriesDataSet( df_pruebas[lambda x: x.time_idx <= training_cutoff], group_ids=["series"], target="quantity", time_idx="time_idx", max_encoder_length=max_encoder_length, max_prediction_length=max_prediction_length, static_categoricals=["series"], time_varying_known_categoricals=["month"], time_varying_unknown_reals=['quantity'], time_varying_known_reals=["time_idx"], target_normalizer=GroupNormalizer( groups=["series"], transformation="softplus" ), # use softplus and normalize by group allow_missing_timesteps=True,
categorical_encoders={'vendedor': NaNLabelEncoder(add_nan=True),"default_code": NaNLabelEncoder(add_nan=True)},
)
create validation set (predict=True) which means to predict the last max_prediction_length points in time
for each series
validation = TimeSeriesDataSet.from_dataset(training, df_pruebas, predict=True, stop_randomization=True, allow_missing_timesteps=True,)# categorical_encoders={"default_code": NaNLabelEncoder(add_nan=True)})
create dataloaders for model
batch_size = 128 # set this between 32 to 128 train_dataloader = training.to_dataloader(train=True, batch_size=batch_size, num_workers=0) val_dataloader = validation.to_dataloader(train=False, batch_size=batch_size * 10, num_workers=0)
MY ERROR IS:
IndexError Traceback (most recent call last) Input In [194], in <cell line: 3>() 1 new_raw_predictions, new_x = best_tft.predict(new_prediction_data, mode="raw", return_x=True) 3 for idx in range(886,887): # plot 10 examples ----> 4 best_tft.plot_prediction(new_x, new_raw_predictions, idx=idx, show_future_observed=False)
File ~.conda\envs\NeuralNetwork\lib\site-packages\pytorch_forecasting\models\temporal_fusion_transformer__init__.py:711, in TemporalFusionTransformer.plot_prediction(self, x, out, idx, plot_attention, add_loss_to_title, show_future_observed, ax, kwargs) 694 """ 695 Plot actuals vs prediction and attention 696 (...) 707 plt.Figure: matplotlib figure 708 """ 710 # plot prediction as normal --> 711 fig = super().plot_prediction( 712 x, 713 out, 714 idx=idx, 715 add_loss_to_title=add_loss_to_title, 716 show_future_observed=show_future_observed, 717 ax=ax, 718 kwargs, 719 ) 721 # add attention on secondary axis 722 if plot_attention:
File ~.conda\envs\NeuralNetwork\lib\site-packages\pytorch_forecasting\models\base_model.py:780, in BaseModel.plot_prediction(self, x, out, idx, add_loss_to_title, show_future_observed, ax, quantiles_kwargs, prediction_kwargs) 775 figs = [] 776 for y_raw, y_hat, y_quantile, encoder_target, decoder_target in zip( 777 y_raws, y_hats, y_quantiles, encoder_targets, decoder_targets 778 ): --> 780 y_all = torch.cat([encoder_target[idx], decoder_target[idx]]) 781 max_encoder_length = x["encoder_lengths"].max() 782 y = torch.cat( 783 ( 784 y_all[: x["encoder_lengths"][idx]], 785 y_all[max_encoder_length : (max_encoder_length + x["decoder_lengths"][idx])], 786 ), 787 )
IndexError: index 886 is out of bounds for dimension 0 with size 445
Code to reproduce the problem
The thing is, I know there are 886 time series in my data.
Let me explain, I am using TFT to predict demands for articles in general and not by "agency _XX", that is, I am using the time series only of the SKUs of each product and in total I have 886 SKUs, which translates into 886 time series that am i doing wrong?