sktime / pytorch-forecasting

Time series forecasting with PyTorch
https://pytorch-forecasting.readthedocs.io/
MIT License
3.98k stars 631 forks source link

TimeSeriesDataset documentation for `scalers` setup for `None` case #204

Open ikulyatin opened 3 years ago

ikulyatin commented 3 years ago

Expected behavior

Executed code:

import numpy as np
import pandas as pd
import pprint

from pytorch_forecasting import TimeSeriesDataSet

test_data = pd.DataFrame(
    dict(
        value=np.random.rand(30) - 0.5,
        group=np.repeat(np.arange(3), 10),
        time_idx=np.tile(np.arange(10), 3),
    )
)

test_data

dataset = TimeSeriesDataSet(
    test_data,
    group_ids=["group"],
    target="value",
    time_idx="time_idx",
    min_encoder_length=5,
    max_encoder_length=5,
    min_prediction_length=2,
    max_prediction_length=2,
    scalers=None,
    target_normalizer=None,
    time_varying_unknown_reals=["value"],
)

Expected a dataset object.

Actual behavior

Got:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "[...] /.venv/lib/python3.8/site-packages/pytorch_forecasting/data/timeseries.py", line 271, in __init__
    self.scalers = {} if len(scalers) == 0 else scalers
TypeError: object of type 'NoneType' has no len()

Code to reproduce the problem

See above. The fix is either in the documentation (specify to set scalers to '', rather than None), or in the timeseries.py module.

ikulyatin commented 3 years ago

Another alternative solution: clarify in the documentation that the dictionary has to contain the name of the variable, e.g. scalers = {"value": None}.

EDIT: a related issue:

If is set:

from sklearn.preprocessing import MinMaxScaler

dataset = TimeSeriesDataset(
                          ...,
                          scalers={"value": MinMaxScaler()}`
                          ...,
                )

Calling dataset.scalers returns {'value': TorchNormalizer(method='identity')}. Am I setting this wrong? I guess this part in timeseries.py is dealing with this (lines 536-537):

if self.target in self.reals:
            self.scalers[self.target] = self.target_normalizer

What I'm trying to do is to normalize my input values, which should also normalize the targets. With MinMaxScaler() I want it to scale by min-max within the test set. Maybe this is not the intended use for scalers and I should preprocess my data before feeding it to TimeSeriesDataset?

jdb78 commented 3 years ago

I think the minmax scaler is not implemented at the moment - it should be easy to change that though! Also, currently, one cannot use a different scaler for inputting targets and outputting them. Might be worth adjusting this.

jdb78 commented 3 years ago

There is now an improved check in place #220.