[ENH] using TFT without past target values

mahaassr commented 4 months ago

-Hi,

I have a question regarding the use of the Temporal Fusion Transformer (TFT) model. Is it possible to effectively use the TFT model without providing past target values in the known or unknown inputs? Specifically, I am only passing the target value as the target in the TimeSeriesDataset class and never include past target values in the known or unknown inputs. Could you please provide some guidance in such scenarios? Thank you for your assistance! Best regards,

Maha

moogoofoo commented 2 months ago

Did you find an answer to this question? I have the same problem/question.

fkiraly commented 2 months ago

I think it is fixed by this: https://github.com/jdb78/pytorch-forecasting/pull/1667

Generally, it is hard to understand the bug without minimal reproducible code - it would be appreciated if you could post code, or check whether the PR fixes the failure in your case.

moogoofoo commented 2 months ago

For my issue, I didn't want the target values being sent to the encoder, which for me causes leakage when there is some future aspect in the target values.. Not at all sure that this is the best approach but it seems like it might work for me.

class MyTimeSeriesDataSet(TimeSeriesDataSet):

def __getitem__(self, idx: int) -> Tuple[Dict[str, torch.Tensor], torch.Tensor]:
    """
    Get sample for model

    Args:
        idx (int): index of prediction (between ``0`` and ``len(dataset) - 1``)

    Returns:
        Tuple[Dict[str, torch.Tensor], torch.Tensor]: x and y for model
    """

[......]

at the end of the function I changed for my multi-target case:

    if self.multi_target:
        encoder_target = [t[:encoder_length] for t in target]
    # Added the following hack so that the encoder_target values are zeroed out and thus the encoder is not able to use them
        for each_encoder_target in encoder_target:
            each_encoder_target[:] = 0.0
        target = [t[encoder_length:] for t in target]
    else:
        encoder_target = target[0][:encoder_length]
        target = target[0][encoder_length:]
        target_scale = target_scale[0]

moogoofoo commented 2 months ago

More appropriately, shouldn’t there be some way of specifying which taget variables should not be sent to the encoder? As for the documentation it wasn’t at all clear to me this is what was happening and it took me a while to understand this. The documentation should be abundantly clear about this.

fkiraly commented 2 months ago

Does this issue summarize the documentation request well? https://github.com/jdb78/pytorch-forecasting/issues/1591

What would help a lot if (in #1591) you could point exactly to classes or methods, with import locations, where you think documentation is currently unclear, @moogoofoo. (Pull requests, of course, are also always appreciated)

Further, if you think the interface should change to a specific target state, an explicit explanation in this issue would be helpful.

jpswensen commented 1 month ago

I was having the same issue. My target was based on looking up to 20 steps into the future, so this means that for the encoder_targets, I somehow needed to ignore the final 20 steps of of the encoder_targets in the getitem in order to avoid data leakage problems. I initially knew something was wrong because I was getting unreasonably high accuracies in the problem I was tackling, which caused me to go digging and see that the targets were getting used intermediately at the encoder output (I remembered reading this in the paper on TFT, but then had forgotten it).

For MSELoss problems (and similar), you can set them to NaN, as the PyTorch loss function is smart enough to know that those shouldn't factor into the loss. For my case of CrossEntropy, you can set them to -100, which corresponds to the Pytorch CrossEntropyLoss 'ignore_index' parameters. This causes the loss to ignore the values where the target is -100.

I couldn't find a built-in way of doing this kind of masking easily. My solution was to just add a line at Line 1662 of the timeseries.py (right before the return), where I set the

encoder_targets[-20:] = -100 # (or NaN for MSELoss problems)

Here I replace "-20" with whatever my lookahead window was when computing my target value. In this manner, I think I am assuring that I don't have data leakage.

This is a very hacky solution, and I tried to see if there was a way to add a mask or lookahead window by subclassing TimeSeriesDataset. I got it working with the TimeSeriesDataset, but then couldn't figure out why that didn't also work when using the from_dataset static member function. I really need this to work for from_dataset for my val and test sets, so that I have the same normalization statistics derived from the training set.

I need to dig in more to find a permanent solution, and could potentially make a pull request once I get it sorted out.

jpswensen commented 1 month ago

Followup: Here is the derived TimeSeriesDataSet class that I came up with to do the masking I need. It seems to be working as I wanted it to. I'm sure there are more compact ways of doing the parameters with *args and **kwargs. This allows me to make my binary training targets be looking into the future, but ensure there is no data leakage for the intermediate encoder_targets. The one thing I hadn't realized until I was working on this is that the TFT can have variable encoder series lengths. What this means is that if I don't set the min_encoder_length to be larger than the encoder_mask_len, then I have the possibility of having all encoder_targets being masked off. I will probably try adding all three parameters of min_encoder_length, max_encoder_length, and encoder_mask_len to my optuna parameter optimization search.

class MaskedEncoderTimeSeriesDataSet(TimeSeriesDataSet):
    def __init__(self, 
        encoder_mask_len,
        data: pd.DataFrame,
        time_idx: str,
        target: Union[str, List[str]],
        group_ids: List[str],
        weight: Union[str, None] = None,
        max_encoder_length: int = 30,
        min_encoder_length: int = None,
        min_prediction_idx: int = None,
        min_prediction_length: int = None,
        max_prediction_length: int = 1,
        static_categoricals: List[str] = [],
        static_reals: List[str] = [],
        time_varying_known_categoricals: List[str] = [],
        time_varying_known_reals: List[str] = [],
        time_varying_unknown_categoricals: List[str] = [],
        time_varying_unknown_reals: List[str] = [],
        variable_groups: Dict[str, List[int]] = {},
        constant_fill_strategy: Dict[str, Union[str, float, int, bool]] = {},
        allow_missing_timesteps: bool = False,
        lags: Dict[str, List[int]] = {},
        add_relative_time_idx: bool = False,
        add_target_scales: bool = False,
        add_encoder_length: Union[bool, str] = "auto",
        target_normalizer: Union[None, str, List[str], Tuple[str], None] = "auto",
        categorical_encoders: Dict[str, str] = {},
        scalers: Dict[str, Union[str, None]] = {},
        randomize_length: Union[None, Tuple[float, float], bool] = False,
        predict_mode: bool = False,):

        # Save the new parameter
        self.encoder_mask_len = encoder_mask_len

        # Call the parent class's __init__ method with the remaining arguments
        super().__init__(data, 
                         time_idx, 
                         target, 
                         group_ids, 
                         weight, 
                         max_encoder_length,
                         min_encoder_length,
                         min_prediction_idx,
                         min_prediction_length,
                         max_prediction_length,
                         static_categoricals,
                         static_reals,
                         time_varying_known_categoricals,
                         time_varying_known_reals,
                         time_varying_unknown_categoricals,
                         time_varying_unknown_reals,
                         variable_groups,
                         constant_fill_strategy,
                         allow_missing_timesteps,
                         lags,
                         add_relative_time_idx,
                         add_target_scales,
                         add_encoder_length,
                         target_normalizer,
                         categorical_encoders,
                         scalers,
                         randomize_length,
                         predict_mode)

    def __getitem__(self, idx: int):
        """
        Fetch a window of data, mask the encoder target for the last K steps (future_target_steps),
        and return the original structure with masked encoder target.

        Args:
            idx (int): Index position for fetching a sample.

        Returns:
            Tuple: (
                dict: Containing x_cat, x_cont, encoder_length, decoder_length, encoder_target, etc.,
                Tuple: target and weight
            )
        """
        # Call the parent class to get the standard input/output structure
        x, (target, weight) = super().__getitem__(idx)

        # Mask the last K (future_target_steps) steps in the encoder target
        encoder_target = x['encoder_target'].clone()  # Clone to avoid modifying the original data

        # encoder_target[-self.encoder_mask_len:] = np.nan  # For MSELoss
        # print(f'Encoder target before masking: {encoder_target}')
        encoder_target[-self.encoder_mask_len:] = -100    # For CrossEntropyLoss
        # print(f'Encoder target after masking: {encoder_target}')

        # Update the x dictionary to include the modified encoder target
        x['encoder_target'] = encoder_target

        # Return the updated x dictionary and the original target, weight tuple
        return x, (target, weight)

    @classmethod
    def from_dataset(
        cls, encoder_mask_len, dataset, data: pd.DataFrame, stop_randomization: bool = False, predict: bool = False, **update_kwargs
    ):
        """
        Generate dataset with different underlying data but same variable encoders and scalers, etc.

        Calls :py:meth:`~from_parameters` under the hood.

        Args:
            encoder_mask_len: the length at the end of the encoder_target that should be ignored
            dataset (TimeSeriesDataSet): dataset from which to copy parameters
            data (pd.DataFrame): data from which new dataset will be generated
            stop_randomization (bool, optional): If to stop randomizing encoder and decoder lengths,
                e.g. useful for validation set. Defaults to False.
            predict (bool, optional): If to predict the decoder length on the last entries in the
                time index (i.e. one prediction per group only). Defaults to False.
            **kwargs: keyword arguments overriding parameters in the original dataset

        Returns:
            TimeSeriesDataSet: new dataset
        """
        return cls.from_parameters(
            encoder_mask_len, dataset.get_parameters(), data, stop_randomization=stop_randomization, predict=predict, **update_kwargs
        )

    @classmethod
    def from_parameters(
        cls,
        encoder_mask_len,
        parameters: Dict[str, Any],
        data: pd.DataFrame,
        stop_randomization: bool = None,
        predict: bool = False,
        **update_kwargs,
    ):
        """
        Generate dataset with different underlying data but same variable encoders and scalers, etc.

        Args:
            encoder_mask_len: the length at the end of the encoder_target that should be ignored
            parameters (Dict[str, Any]): dataset parameters which to use for the new dataset
            data (pd.DataFrame): data from which new dataset will be generated
            stop_randomization (bool, optional): If to stop randomizing encoder and decoder lengths,
                e.g. useful for validation set. Defaults to False.
            predict (bool, optional): If to predict the decoder length on the last entries in the
                time index (i.e. one prediction per group only). Defaults to False.
            **kwargs: keyword arguments overriding parameters

        Returns:
            TimeSeriesDataSet: new dataset
        """
        parameters = deepcopy(parameters)
        if predict:
            if stop_randomization is None:
                stop_randomization = True
            elif not stop_randomization:
                warnings.warn(
                    "If predicting, no randomization should be possible - setting stop_randomization=True", UserWarning
                )
                stop_randomization = True
            parameters["min_prediction_length"] = parameters["max_prediction_length"]
            parameters["predict_mode"] = True
        elif stop_randomization is None:
            stop_randomization = False

        if stop_randomization:
            parameters["randomize_length"] = None
        parameters.update(update_kwargs)

        # Remove 'new_param' if it exists in parameters to avoid duplication
        parameters.pop("encoder_mask_len", None)

        # Create new dataset instance
        new = cls(encoder_mask_len, data, **parameters)
        return new

fkiraly commented 1 month ago

@jpswensen, this is nice! Would you be able to contribute this in a pull request, and possibly a test case, to see whether this works and also does not break anything? Would be great!

sktime / pytorch-forecasting

[ENH] using TFT without past target values #1585