Open grudloff opened 1 week ago
Minimum example of workaround:
import pandas as pd
from pytorch_forecasting import TimeSeriesDataSet
# Define the dataset
max_encoder_length = 10
prediction_length = 3
# Create a dummy dataset
data = pd.DataFrame({
"time_idx": list(range(max_encoder_length)),
"target": list(range(100,100+max_encoder_length)),
"group": ["A"] * max_encoder_length,
})
print(data)
# Append dummy data to the end
dummy_data = pd.DataFrame({
"time_idx": list(range(max_encoder_length, max_encoder_length+prediction_length)),
"target": [0] * prediction_length,
"group": ["A"] * prediction_length,
})
data = pd.concat([data, dummy_data], ignore_index=True)
# Create TimeSeriesDataSet
dataset = TimeSeriesDataSet(
data,
time_idx="time_idx",
target="target",
group_ids=["group"],
min_encoder_length=max_encoder_length // 2,
max_encoder_length=max_encoder_length,
min_prediction_length=1,
max_prediction_length=prediction_length,
predict_mode=True,
target_normalizer=None
)
# Create a dataloader
dataloader = dataset.to_dataloader(train=False, batch_size=1)
# Print the first batch
for x, y in dataloader:
print("Encoder input")
print(x["encoder_target"].numpy())
print("Decoder input")
print(x["decoder_target"].numpy())
print("Encoder lengths")
print(x["encoder_lengths"].numpy())
print("Dummy target")
print(y)
output:
>>> Data
>>> time_idx target group
>>> 0 0 100 A
>>> 1 1 101 A
>>> 2 2 102 A
>>> 3 3 103 A
>>> 4 4 104 A
>>> 5 5 105 A
>>> 6 6 106 A
>>> 7 7 107 A
>>> 8 8 108 A
>>> 9 9 109 A
>>> Encoder input
>>> [[100. 101. 102. 103. 104. 105. 106. 107. 108. 109.]]
>>> Decoder input
>>> [[0. 0. 0.]]
>>> Encoder lengths
>>> [10]
Hm, I think this is a deeper design issue. I agree that this should be possible, easily. I also think the TimeSeriesDataSet
has too many arguments and is too specific.
I have opened a new issue to redesign the data handling layer, there are multiple related problems that one may want to address here: https://github.com/sktime/pytorch-forecasting/issues/1716
Currently,
TimeSeriesDataSet
has the option to set thepredict_mode
flag to True, this allows using the whole sequence, except the last portion used for testing purposes, which will be predicted by the model.However, I haven't found a way to predict using the whole sequence (Think for instance a kaggle competition where you have to submit the following x month predictions with the data you have). I think that an easy workaround could be to just append dummy data at the end so that the effective sequence is the whole sequence (i.e. matching the length of the dummy data appended and the prediction length).
Is there a way to do this currently? If not, I believe that something similar to the
predict_mode
could be a nice way to activate this behavior.