lucidrains / iTransformer

Unofficial implementation of iTransformer - SOTA Time Series Forecasting using Attention networks, out of Tsinghua / Ant group
MIT License
445 stars 36 forks source link

Prediction of first point depends on pred_length #30

Open satyrmipt opened 2 months ago

satyrmipt commented 2 months ago

I predict for 1, 2, and 3 steps forward in time with the same lookback_len. Why predictions for first steps are different in all three predictions? It is the same point in time... Do first prediction depends on consecutive ones like first token depends on consecutive ones in NLP due to approaches like beam search?

from iTransformer import iTransformer
import torch

m1=iTransformer(
    num_variates=1,
    lookback_len=7, 
    depth=1,
    dim=1,
    pred_length=(1,2,3)                     
)

time_series = torch.ones(1, 7, 1)  # (batch, lookback len, variates)
preds = m1(time_series) # Dict[int, Tensor[batch, pred_length, variate]]
# print(np.array(time_series))
for i in preds.keys():
  print(f"Length={i} result:\n", preds[i].detach().numpy() , preds[i].detach().numpy().shape, '\n')

Length=1 result: [[[-0.13998163]]] (1, 1, 1)

Length=2 result: [[[-0.03886282] [ 0.41442168]]] (1, 2, 1)

Length=3 result: [[[ 0.39561594] [-0.61296153] [ 0.08078277]]] (1, 3, 1)

lucidrains commented 2 months ago

@satyrmipt yes indeed, i think this is what they did in the paper (please correct me if i'm wrong)

it should just be one linear projection to the maximum predicted length