AssertionError: filters should not remove entries

jbelisario commented 4 years ago

PyTorch-Forecasting version: 0.5.3
PyTorch version: 1.7.0+cu101
Python version: 3.6.9
Operating System: Windows

Expected behavior

I ran the code to create a TimeSeriesDataSet and expected the code to create the object in order to move on to the validation split.

Actual behavior

The result was this error "AssertionError: filters should not remove entries"
It has to do with the creation of an index of samples.
My id column is a list of property id's.
Could it have something to do with the formatting of our dataset that we might have to change?
The screenshot below is a result from running the following line of code: data.groupby('property_id', observed=True).head()

Code to reproduce the problem

max_prediction_length = 6 max_encoder_length = 3914 training_cutoff = data["time_idx"].max() - max_prediction_length

training = TimeSeriesDataSet( data[lambda x: x.time_idx <= training_cutoff], time_idx = col_def['time_idx'], target = col_def['target'], #Value group_ids = col_def['group_ids'], # the error stems from _construct_index in timeseries.py min_encoder_length=max_encoder_length // 2, # keep encoder length long (as it is in the validation set) max_encoder_length=max_encoder_length, min_prediction_length=1, max_prediction_length=max_prediction_length, static_categoricals = col_def['static_categoricals'], static_reals = col_def['static_reals'], time_varying_known_categoricals = col_def['time_varying_known_categoricals'], variable_groups = {}, # group of categorical variables can be treated as one variable time_varying_known_reals = col_def['time_varying_known_reals'], time_varying_unknown_categoricals = [], time_varying_unknown_reals = col_def['time_varying_unknown_reals'], target_normalizer=GroupNormalizer( groups = col_def['group_ids'], coerce_positive=1.0 ), # use softplus with beta=1.0 and normalize by group add_relative_time_idx = True, add_target_scales = True, add_encoder_length = True, )

Is there any other code you might need to see to be able to better understand where the issue might stem from?

Paste the command(s) you ran and the output. If there was a crash, please include the traceback here.

jdb78 commented 4 years ago

Could you try with the 0.6.0 version of PyTorch Forecasting. It should give more verbose error messages and let's you better understand what is the issue. The error has also been downgraded to a warning. Essentially, it complains about removing entire time series from the dataset as a result of your encoder and prediction lengths as well as your minimum prediction index.

jbelisario commented 4 years ago

Could you try with the 0.6.0 version of PyTorch Forecasting. It should give more verbose error messages and let's you better understand what is the issue. The error has also been downgraded to a warning. Essentially, it complains about removing entire time series from the dataset as a result of your encoder and prediction lengths as well as your minimum prediction index.

Jan, thank you for the response. It seems as if the TimeSeriesDataSet object was created.

The new error I am encountering has to do with the creation of the validation data set.
I have seen your responses to issue #154 and have set my max_prediction_length = 36 and max_encoder_length = 60. The total timesteps in the dataset are 120.
I seem to be getting a key error of a year in accordance with whatever the max_prediction_length is set to...my dataset ranges from 2010-2020, so if I set max_prediction_length to 6, I get KeyError: '2020'

Here is the code that creates the training and validation sets:

max_prediction_length = 36
max_encoder_length = 60
training_cutoff = data["time_idx"].max() - max_prediction_length

training = TimeSeriesDataSet(
    data[lambda x: x.time_idx <= training_cutoff],
    time_idx = col_def['time_idx'],
    target = col_def['target'], #Value
    group_ids = col_def['group_ids'], 
    min_encoder_length=max_encoder_length // 2,  # keep encoder length long (as it is in the validation set)
    max_encoder_length=max_encoder_length,
    min_prediction_length=1,
    max_prediction_length=max_prediction_length,
    static_categoricals = col_def['static_categoricals'], 
    static_reals = col_def['static_reals'], 
    time_varying_known_categoricals = col_def['time_varying_known_categoricals'], 
    variable_groups = {},  # group of categorical variables can be treated as one variable
    time_varying_known_reals = col_def['time_varying_known_reals'], 
    time_varying_unknown_categoricals = [], 
    time_varying_unknown_reals = col_def['time_varying_unknown_reals'],
    target_normalizer=GroupNormalizer(
        groups = col_def['group_ids'], coerce_positive=1.0
    ),  # use softplus with beta=1.0 and normalize by group
    add_relative_time_idx = True,
    add_target_scales = True,
    add_encoder_length = True,
)

# create validation set (predict=True) which means to predict the last max_prediction_length points in time for each series
validation = TimeSeriesDataSet.from_dataset(training, data, predict = True, stop_randomization = True)

Here is the error traceback:

Any response is appreciated.

jdb78 commented 3 years ago

I believe you are using the year as a categorical. You can either specify it to be a continuous variable or you use a NaNLabelEncoder with add_nan=True to allow unknown categories. Probably, it makes more sense to use it as a continuous variable.

sktime / pytorch-forecasting