I have been trying to test TFT for an extremely simple toy dataset, but always encounter a ValueError when initialising TimeSeriesDataSet.
I am trying to forecast a simple sine wave (again, this is just to get up and running); my DataFrame has two columns, time_idx (int 0 to 100), and price (float -1.0 to 1.0). The code for generating this dataset and initialising my dataset is as follows:
# Simply sample a sin wave
def sample_sin(samples_per_cycle, n_cycles, noise=None):
sampling_gap = 2 * math.pi / samples_per_cycle
xs = [sample * sampling_gap for sample in range(samples_per_cycle * n_cycles)]
ys = [math.sin(x) + ((noise * random.random()) if noise is not None else 0) for x in xs]
return xs, ys
# Save sampled sin wave as csv
def save_sin_dataset(filename, samples_per_cycle, n_cycles, noise=None):
_, ys = sample_sin(samples_per_cycle, n_cycles, noise=noise)
df = DataFrame({'price': ys})
df.index.name = 'time_idx'
df.to_csv(filename)
return range(len(ys)), ys
# Load csv
df = pd.read_csv('sin.csv')
max_encode_length = 36
max_prediction_length = 6
training_cutoff = 90
training = TimeSeriesDataSet(
df[:training_cutoff],
time_idx="time_idx",
group_ids=["price"],
target="price",
min_encoder_length=max_encode_length,
max_encoder_length=max_encode_length,
min_prediction_length=1,
static_categoricals=[],
static_reals=[],
time_varying_known_categoricals=[],
max_prediction_length=max_prediction_length,
time_varying_unknown_reals=[
"price",
],
target_normalizer=EncoderNormalizer(
coerce_positive=1.0
),
add_relative_time_idx=True,
add_target_scales=True,
add_encoder_length=True,
)
I misunderstood the group_ids argument, fixed by adding a third column sin which contains the same value for each row. Might be a good option to assume there is only a single series if group_ids is None.
I have been trying to test TFT for an extremely simple toy dataset, but always encounter a ValueError when initialising TimeSeriesDataSet.
I am trying to forecast a simple sine wave (again, this is just to get up and running); my DataFrame has two columns,
time_idx
(int 0 to 100), andprice
(float -1.0 to 1.0). The code for generating this dataset and initialising my dataset is as follows:The error occurs at line 707 in
timeseries.py
:Traceback:
When I debug,
df_index_first
anddf_index_last
containNaN
values, and I have no clue why; my DataFrame has no gaps orNaN
s.If someone could let me know what I'm doing wrong that would be great.