sktime / pytorch-forecasting

Time series forecasting with PyTorch
https://pytorch-forecasting.readthedocs.io/
MIT License
4k stars 632 forks source link

ValueError in _construct_index when initialising TimeSeriesDataSet #140

Closed fraserprice closed 4 years ago

fraserprice commented 4 years ago

I have been trying to test TFT for an extremely simple toy dataset, but always encounter a ValueError when initialising TimeSeriesDataSet.

I am trying to forecast a simple sine wave (again, this is just to get up and running); my DataFrame has two columns, time_idx (int 0 to 100), and price (float -1.0 to 1.0). The code for generating this dataset and initialising my dataset is as follows:

# Simply sample a sin wave
def sample_sin(samples_per_cycle, n_cycles, noise=None):
    sampling_gap = 2 * math.pi / samples_per_cycle
    xs = [sample * sampling_gap for sample in range(samples_per_cycle * n_cycles)]
    ys = [math.sin(x) + ((noise * random.random()) if noise is not None else 0) for x in xs]

    return xs, ys

# Save sampled sin wave as csv
def save_sin_dataset(filename, samples_per_cycle, n_cycles, noise=None):
    _, ys = sample_sin(samples_per_cycle, n_cycles, noise=noise)

    df = DataFrame({'price': ys})
    df.index.name = 'time_idx'
    df.to_csv(filename)

    return range(len(ys)), ys

# Load csv
df = pd.read_csv('sin.csv')

max_encode_length = 36
max_prediction_length = 6
training_cutoff = 90

training = TimeSeriesDataSet(
    df[:training_cutoff],
    time_idx="time_idx",
    group_ids=["price"],
    target="price",
    min_encoder_length=max_encode_length,
    max_encoder_length=max_encode_length,
    min_prediction_length=1,
    static_categoricals=[],
    static_reals=[],
    time_varying_known_categoricals=[],
    max_prediction_length=max_prediction_length,
    time_varying_unknown_reals=[
        "price",
    ],
    target_normalizer=EncoderNormalizer(
        coerce_positive=1.0
    ),
    add_relative_time_idx=True,
    add_target_scales=True,
    add_encoder_length=True,
)

The error occurs at line 707 in timeseries.py:

df_index["count"] = (df_index["time_last"] - df_index["time_first"]).astype(int) + 1

Traceback:

Traceback (most recent call last):
  File "/Users/fraser/Documents/Personal Projects/Kontrary/forecasting.py", line 36, in <module>
    add_encoder_length=True,
  File "/Users/fraser/Documents/Personal Projects/Kontrary/venv/lib/python3.6/site-packages/pytorch_forecasting/data/timeseries.py", line 284, in __init__
    self.index = self._construct_index(data, predict_mode=predict_mode)
  File "/Users/fraser/Documents/Personal Projects/Kontrary/venv/lib/python3.6/site-packages/pytorch_forecasting/data/timeseries.py", line 707, in _construct_index
    df_index["count"] = (df_index["time_last"] - df_index["time_first"]).astype(int) + 1
  File "/Users/fraser/Documents/Personal Projects/Kontrary/venv/lib/python3.6/site-packages/pandas/core/generic.py", line 5546, in astype
    new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors,)
  File "/Users/fraser/Documents/Personal Projects/Kontrary/venv/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 595, in astype
    return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
  File "/Users/fraser/Documents/Personal Projects/Kontrary/venv/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 406, in apply
    applied = getattr(b, f)(**kwargs)
  File "/Users/fraser/Documents/Personal Projects/Kontrary/venv/lib/python3.6/site-packages/pandas/core/internals/blocks.py", line 595, in astype
    values = astype_nansafe(vals1d, dtype, copy=True)
  File "/Users/fraser/Documents/Personal Projects/Kontrary/venv/lib/python3.6/site-packages/pandas/core/dtypes/cast.py", line 966, in astype_nansafe
    raise ValueError("Cannot convert non-finite values (NA or inf) to integer")
ValueError: Cannot convert non-finite values (NA or inf) to integer

When I debug, df_index_first and df_index_last contain NaN values, and I have no clue why; my DataFrame has no gaps or NaNs.

If someone could let me know what I'm doing wrong that would be great.

fraserprice commented 4 years ago

I misunderstood the group_ids argument, fixed by adding a third column sin which contains the same value for each row. Might be a good option to assume there is only a single series if group_ids is None.