jdb78 / pytorch-forecasting

Time series forecasting with PyTorch
https://pytorch-forecasting.readthedocs.io/
MIT License
3.84k stars 608 forks source link

Predict on new data (Model in production) #737

Open javierclb opened 2 years ago

javierclb commented 2 years ago

Expected behavior

I have a model with which I'm comfortable with. Its goal is to forecast the next 72 hours of wind energy generation based on the previous 2 weeks. Now I want to use the model with new data to run the forecast for the next 72 hours (which is unknown)

Actual behavior

I'm using the TimeSeriesDataSet constructor in the following way

self.data = TimeSeriesDataSet(
        self.df,
        time_idx="index",
        target="EGrid [MWh]", min_encoder_length=168*2,max_encoder_length=168*2, min_prediction_length=72,max_prediction_length=72,
        group_ids=["series"], time_varying_unknown_reals=["EGrid [MWh]"], predict_mode=False,
        )
)

The dataset only containts 336 values, which represents two weeks of hourly data. Once I try to run the predict method in order to get 72 values, what I ended up getting is the following message:


AssertionError Traceback (most recent call last) /tmp/ipykernel_9002/2416573154.py in 1 input_file = 'data/Lebu_1_8760.csv' 2 df = pd.read_csv(input_file) ----> 3 test = PythonPredictor(df)

/tmp/ipykernel_9002/1497170273.py in init(self, df) 14 context_length = max_encoder_length 15 prediction_length = max_prediction_length ---> 16 self.data = TimeSeriesDataSet( 17 self.df, 18 time_idx="index",

~/miniconda3/envs/atco_plan/lib/python3.8/site-packages/pytorch_forecasting/data/timeseries.py in __init__(self, data, time_idx, target, group_ids, weight, max_encoder_length, min_encoder_length, min_prediction_idx, min_prediction_length, max_prediction_length, static_categoricals, static_reals, time_varying_known_categoricals, time_varying_known_reals, time_varying_unknown_categoricals, time_varying_unknown_reals, variable_groups, constant_fill_strategy, allow_missing_timesteps, lags, add_relative_time_idx, add_target_scales, add_encoder_length, target_normalizer, categorical_encoders, scalers, randomize_length, predict_mode)
    437 
    438         # create index
--> 439         self.index = self._construct_index(data, predict_mode=predict_mode)
    440 
    441         # convert to torch tensor for high performance data loading later

~/miniconda3/envs/atco_plan/lib/python3.8/site-packages/pytorch_forecasting/data/timeseries.py in _construct_index(self, data, predict_mode)
   1244                 UserWarning,
   1245             )
-> 1246         assert (
   1247             len(df_index) > 0
   1248         ), "filters should not remove entries all entries - check encoder/decoder lengths and lags"

AssertionError: filters should not remove entries all entries - check encoder/decoder lengths and lags

Do you know what can be done in this situation?

Thank you very much in advance for your help.

Mahima-ai commented 2 years ago

As per my experience with this library, the length of data(forecasting instances) given to the predict method must be equal or greater than (min_encoder_length + min_prediction_length). In your case, this is 368. This is really a concern in the production environment.

So as a work around you can try to lower the min_encoder_length and min_prediction_length or increase the predict instances to 368.

javierclb commented 2 years ago

Thank you for the good advice. I will make the adjustment you suggested.

on the hand, it would be really nice if the developers could add some examples to take the models into production.

pchalasani commented 2 years ago

As per my experience with this library, the length of data(forecasting instances) given to the predict method must be equal or greater than (min_encoder_length + min_prediction_length).

This is very unintuitive behavior and not documented well. This is the code that's causing the error -- https://github.com/jdb78/pytorch-forecasting/blob/master/pytorch_forecasting/data/timeseries.py#L1184

So if min_encoder_length = 5, and min_prediction_length = 5, and I have just the most recent 5 data points, I have to create an augmented dataset where I append 5 additional points just so I don't get this error.

pnsafari commented 2 years ago

@Mahima-ai does it really make sense? In some models such as NBeats, the min and max I think must be equal. Then basically the only solution could be the one proposed by @pchalasani ? Am I missing something since I don't understand the reason?