Confusion on understanding TFT

processadd commented 2 years ago

Hi, Thanks a lot for this project. Got some understanding issues and hopefully make sense.

When predict new, by reading through issues and tutorial, seems we need to fuse the known and future part, like new_prediction_data = pd.concat([encoder_data, decoder_data], ignore_index=True) in tutorial, but how the unknown future values are "ignored" in code? From source code,
```
    embeddings_varying_decoder = {
        name: input_vectors[name][:, max_encoder_length:] for name in self.decoder_variables  # select decoder
    }
    # run local decoder
    decoder_output, _ = self.lstm_decoder(
        embeddings_varying_decoder,
        (hidden, cell),
        lengths=decoder_lengths,
        enforce_sorted=False,
    )
```
the unknown future values are in input_vectors[name][:, max_encoder_length:] right? So if I fill unknown future values with, say, -1s, seems they are used in decoder. Are they supposed to be "ignored" some place?
Is there some recommendation/best practice to fill the unknown future values?
If I have, say, A=200 unique values, B=400, C=500, D=600 in groups, and I want to work on hourly granularity, so with one year TFT will take this as 200*400*500*600*24*365 series in full list? Will it work if missing a lot of compositions of A, B, C, D or will it work if some compositions are only shown in very few timing points? e.g. (A1, B1, C1, D1) only shows 50 hours in a year.
If the compositions are very sparse in terms of full composition set (20040050060024*365), can we use it to predict new? E.g. there are lot of samples but (A1, B1, C1, D1) is very few. Is it still ok to predict?
If I don't care the compositions, instead only the "total" is the concern, e.g. we have hourly "usage" (as target) for A, B, C, D in group, but we don't care each separate compositions, instead, we want to predict the total hourly usage. And data has no way to tell an identity (like store id). TFT is good for this case?

Thanks in advance.

hiandersson commented 2 years ago

I have the same question and I am new to this as well, so I might be wrong, but this is what I found:

Decoder variables is using self.hparams.time_varying_reals_decoder

 def decoder_variables(self) -> List[str]:
        """List of all decoder variables in model (excluding static variables)"""
        return self.hparams.time_varying_categoricals_decoder + self.hparams.time_varying_reals_decoder

And time_varying_reals_decoder is set from dataset.time_varying_known_reals

So from what I can see, decoder will only use known variables and not unknown

processadd commented 2 years ago

I think you are right.

time_varying_categoricals_decoder=dataset.time_varying_known_categoricals
time_varying_reals_decoder=dataset.time_varying_known_reals

Above are inputs for decoder. Further, for LSTM seq2seq, decoder uses encoder's last hidden as initial hidden, but seems input can be either anything (e.g. some start token in NLP case, any values), or, I saw some seq2seq decoder for time series uses lookback window data as input for each cell iteratively. Here in TFT the inputs are kind of a mix of (known future categoricals and reals) and any values for other unknowns (instead of using last lookback window)

sktime / pytorch-forecasting

Confusion on understanding TFT #882