sktime / pytorch-forecasting

Time series forecasting with PyTorch
https://pytorch-forecasting.readthedocs.io/
MIT License
3.98k stars 629 forks source link

Unknown category '{e.args[0]}' encountered. Set `add_nan=True` to allow unknown categories #726

Open qisuqi opened 2 years ago

qisuqi commented 2 years ago

I am trying to follow the tutorial for interpretable forecasting with N-beats but with my own dataset, which is just one univariate time series.The trouble I am facing is creating a validation set using the same normalization techniques as for the training dataset.

I have seen other threads where adding an variable of categorical_encoders={col_name: NaNLabelEncoder(add_nan=True)} should solve the problem but it did not for me, and the error of "Unknown category '{e.args[0]}' encountered. Set add_nan=True to allow unknown categories" still arises.

In the linked tutorial, the col_name is the column that identitfies different time series. Since I only have one time series, I have set this to my actual time series which could be where the problem arises?

Here is how I initialise the TimeSeriesDataSet for the training set.

forecast_length = 14
backcast_length = 3 * forecast_length
batch_size = 128

file['time_idx'] = np.arange(len(file))
file['static'] = np.repeat(1, len(file))

training_cutoff = file['time_idx'].max() - forecast_length
context_length = backcast_length
prediction_length = forecast_length

training = TimeSeriesDataSet(file[lambda x: x.time_idx <= training_cutoff],
                             time_idx='time_idx',
                             target='Data',
                             group_ids=['static'],
                             time_varying_unknown_reals=['Data'],
                             max_encoder_length=context_length,
                             max_prediction_length=forecast_length,
                             time_varying_unknown_categoricals=[],
                             categorical_encoders={"Data": NaNLabelEncoder(add_nan=True).fit(file.Data)})

And here is how I try to create the validation set.

validation = TimeSeriesDataSet.from_dataset(training,
                                            file,
                                            min_prediction_idx=training_cutoff+1,
                                            stop_randomization=True)

The error I got for trying to create the validation set is.

Traceback (most recent call last):
  File ".\anaconda3\envs\tensorflow_gpuenv\lib\site-packages\pytorch_forecasting\data\encoders.py", line 132, in transform
    encoded = [self.classes_[v] for v in y]
  File ".\anaconda3\envs\tensorflow_gpuenv\lib\site-packages\pytorch_forecasting\data\encoders.py", line 132, in <listcomp>
    encoded = [self.classes_[v] for v in y]
KeyError: 1000.77

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./nbeats_torch.py", line 46, in <module>
    stop_randomization=True)
  File ".\anaconda3\envs\tensorflow_gpuenv\lib\site-packages\pytorch_forecasting\data\timeseries.py", line 1113, in from_dataset
    dataset.get_parameters(), data, stop_randomization=stop_randomization, predict=predict, **update_kwargs
  File ".\anaconda3\envs\tensorflow_gpuenv\lib\site-packages\pytorch_forecasting\data\timeseries.py", line 1158, in from_parameters
    new = cls(data, **parameters)
  File ".\anaconda3\envs\tensorflow_gpuenv\lib\site-packages\pytorch_forecasting\data\timeseries.py", line 434, in __init__
    data = self._preprocess_data(data)
  File ".\anaconda3\envs\tensorflow_gpuenv\lib\site-packages\pytorch_forecasting\data\timeseries.py", line 747, in _preprocess_data
    data[self.target] = self.target_normalizer.transform(data[self.target])
  File ".\anaconda3\envs\tensorflow_gpuenv\lib\site-packages\pytorch_forecasting\data\encoders.py", line 135, in transform
    f"Unknown category '{e.args[0]}' encountered. Set `add_nan=True` to allow unknown categories"
KeyError: "Unknown category '1000.77' encountered. Set `add_nan=True` to allow unknown categories"

1000.77 is the first entry of my validation set.

Any help is appreciated!!

UPDATE: I have added an variable of target_normalizer=NaNLabelEncoder(add_nan=True) in the initialisation of the training set and now everything works. However, I am getting an user warning as follows:

.\anaconda3\envs\tensorflow_gpuenv\lib\site-packages\pytorch_forecasting\data\encoders.py:121: UserWarning: Found 3628 unknown classes which were set to NaN
  UserWarning,
.\anaconda3\envs\tensorflow_gpuenv\lib\site-packages\pytorch_forecasting\data\encoders.py:121: UserWarning: Found 55 unknown classes which were set to NaN

And an assertion error for when calling net = NBeats.from_dataset(training, learning_rate=3e-2, weight_decay=1e-2, widths=[32, 512], backcast_loss_ratio=0.1):

Traceback (most recent call last):
  File "./nbeats_torch.py", line 54, in <module>
    net = NBeats.from_dataset(training, learning_rate=3e-2, weight_decay=1e-2, widths=[32, 512], backcast_loss_ratio=0.1)
  File ".\anaconda3\envs\tensorflow_gpuenv\lib\site-packages\pytorch_forecasting\models\nbeats\__init__.py", line 199, in from_dataset
    ), "only regression tasks are supported - target must not be categorical"
AssertionError: only regression tasks are supported - target must not be categorical

UPDATE2: By examining closely of the error message and prining training.get_parameters(), I have realised that NaNLabelEncoder() is chosen automatically as the appropriate normaliser by default. Therefore, by choosing an actual appropriate normaliser such as target_normalizer=TorchNormalizer(method='identity') solves the problem.

strakehyr commented 2 years ago

Got the same issue, but in my case, the TorchNormalizer or any other normalizer does not solve the issue.