sktime / pytorch-forecasting

Time series forecasting with PyTorch
https://pytorch-forecasting.readthedocs.io/
MIT License
4k stars 632 forks source link

AssertionError: filters should not remove entries #142

Closed jbelisario closed 3 years ago

jbelisario commented 4 years ago

Expected behavior

I ran the code to create a TimeSeriesDataSet and expected the code to create the object in order to move on to the validation split.

Actual behavior

image

Code to reproduce the problem

max_prediction_length = 6 max_encoder_length = 3914 training_cutoff = data["time_idx"].max() - max_prediction_length

training = TimeSeriesDataSet( data[lambda x: x.time_idx <= training_cutoff], time_idx = col_def['time_idx'], target = col_def['target'], #Value group_ids = col_def['group_ids'], # the error stems from _construct_index in timeseries.py min_encoder_length=max_encoder_length // 2, # keep encoder length long (as it is in the validation set) max_encoder_length=max_encoder_length, min_prediction_length=1, max_prediction_length=max_prediction_length, static_categoricals = col_def['static_categoricals'], static_reals = col_def['static_reals'], time_varying_known_categoricals = col_def['time_varying_known_categoricals'], variable_groups = {}, # group of categorical variables can be treated as one variable time_varying_known_reals = col_def['time_varying_known_reals'], time_varying_unknown_categoricals = [], time_varying_unknown_reals = col_def['time_varying_unknown_reals'], target_normalizer=GroupNormalizer( groups = col_def['group_ids'], coerce_positive=1.0 ), # use softplus with beta=1.0 and normalize by group add_relative_time_idx = True, add_target_scales = True, add_encoder_length = True, )

Is there any other code you might need to see to be able to better understand where the issue might stem from?

Paste the command(s) you ran and the output. If there was a crash, please include the traceback here.

image

jdb78 commented 4 years ago

Could you try with the 0.6.0 version of PyTorch Forecasting. It should give more verbose error messages and let's you better understand what is the issue. The error has also been downgraded to a warning. Essentially, it complains about removing entire time series from the dataset as a result of your encoder and prediction lengths as well as your minimum prediction index.

jbelisario commented 4 years ago

Could you try with the 0.6.0 version of PyTorch Forecasting. It should give more verbose error messages and let's you better understand what is the issue. The error has also been downgraded to a warning. Essentially, it complains about removing entire time series from the dataset as a result of your encoder and prediction lengths as well as your minimum prediction index.

Jan, thank you for the response. It seems as if the TimeSeriesDataSet object was created.

Here is the code that creates the training and validation sets:

max_prediction_length = 36
max_encoder_length = 60
training_cutoff = data["time_idx"].max() - max_prediction_length

training = TimeSeriesDataSet(
    data[lambda x: x.time_idx <= training_cutoff],
    time_idx = col_def['time_idx'],
    target = col_def['target'], #Value
    group_ids = col_def['group_ids'], 
    min_encoder_length=max_encoder_length // 2,  # keep encoder length long (as it is in the validation set)
    max_encoder_length=max_encoder_length,
    min_prediction_length=1,
    max_prediction_length=max_prediction_length,
    static_categoricals = col_def['static_categoricals'], 
    static_reals = col_def['static_reals'], 
    time_varying_known_categoricals = col_def['time_varying_known_categoricals'], 
    variable_groups = {},  # group of categorical variables can be treated as one variable
    time_varying_known_reals = col_def['time_varying_known_reals'], 
    time_varying_unknown_categoricals = [], 
    time_varying_unknown_reals = col_def['time_varying_unknown_reals'],
    target_normalizer=GroupNormalizer(
        groups = col_def['group_ids'], coerce_positive=1.0
    ),  # use softplus with beta=1.0 and normalize by group
    add_relative_time_idx = True,
    add_target_scales = True,
    add_encoder_length = True,
)

# create validation set (predict=True) which means to predict the last max_prediction_length points in time for each series
validation = TimeSeriesDataSet.from_dataset(training, data, predict = True, stop_randomization = True)

Here is the error traceback: image image

Any response is appreciated.

jdb78 commented 3 years ago

I believe you are using the year as a categorical. You can either specify it to be a continuous variable or you use a NaNLabelEncoder with add_nan=True to allow unknown categories. Probably, it makes more sense to use it as a continuous variable.