[BUG] TFTModel predicts nan values when MapeLoss function is used

akepa commented 2 months ago

Describe the bug

When MapeLoss is used as loss function with a TFTModel (loss_fn parameter), the output of the training shows val_loss and train_loss = 0:

from darts.utils.losses import MapeLoss

model = TFTModel(
        ...
        loss_fn=MapeLoss(),
        ...
    )

Epoch 4: 100%
 1/1 [00:00<00:00, 11.02it/s, train_loss=0.000, val_loss=0.000]

Then, when we try to get some predictions with that model, prediction method returns an array of nan values:

array([[[nan]],

       [[nan]],

       [[nan]]]

There is no issue when any other loss function (e.g MSELoss) is used.

To Reproduce It can be reproduced with the following code. Dataset is also attached: input_example.csv

import pandas as pd
import torch
from pytorch_lightning.callbacks import Callback, EarlyStopping
from darts import TimeSeries
from darts.models import TFTModel
from darts.utils.losses import MapeLoss
from torch.nn import MSELoss

# Retrieve target series
df = pd.read_csv('input_example.csv')
s = TimeSeries.from_dataframe(df, 'date', 'target')
test = s[-3:]
val = s[-18:-3]
train = s[:-18]

# Build and train the model
early_stopper = EarlyStopping("val_loss", min_delta=0.001, patience=10, verbose=True)
callbacks = [early_stopper]

model = TFTModel(
        input_chunk_length=12,
        output_chunk_length=3,
        batch_size=64,
        n_epochs=5,
        add_relative_index=True,
        add_encoders=None,
        loss_fn=MapeLoss(), # MapeLoss(),# MSELoss(),
        likelihood=None,
        random_state=42,
        pl_trainer_kwargs={"accelerator": "gpu", "devices": [0], "callbacks": callbacks},
        save_checkpoints=True, 
        model_name="my_model",
        force_reset=True
    )

model.fit(series=train,val_series=val,verbose=True)

best_model = model.load_from_checkpoint(model_name="my_model", best=True, work_dir='darts_logs')

best_model.predict(n=3, num_samples=1, series=train.append(val))

Expected behavior Prediction output should be an array of float values, and not an array of nans.

System (please complete the following information):

Python version: 3.11.8
darts version 0.30.0

Additional context I've tried to understand where the nan values are coming from. I've modified MapeLoss (https://github.com/unit8co/darts/blob/master/darts/utils/losses.py#L96) to print the values of the two parameters:

    def forward(self, inpt, tgt):
        print(f'TGT: {tgt}')
        print(f'INPT: {inpt}')
        return torch.mean(torch.abs(_divide_no_nan(tgt - inpt, tgt)))

It seems that from the second method call onwards, INPT parameter comes with an array of nan.

dennisbader commented 2 months ago

Hi @akepa, you have a 0. in your data which is most like the issue here.

akepa commented 2 months ago

Thank you very much for the quick response. Indeed, the data was scaled with the default MinMaxScaler, and if I replace the value at 0 by a positive number, the problem disappears.

Is this case supposed to work? If not, should it be specified somewhere in the documentation?

madtoinou commented 2 months ago

According to the documentation, the NaN and inf are replaced by 0 when using MapeLoss; as soon as the model forecasts nan, the loss becomes equals to 0. This "zeroing" might also impact the back-propagation and cause some weights in the model to become nan, leading to nan predictions (to be confirmed).

It is expected that MAPE will not work with a dataset containing zeros (by definition), I don't think that adding a sentence in its docstring to remind the users to avoid using the MinMaxScaler in combination with this loss is relevant as the zeros can have various origins/causes.

unit8co / darts

[BUG] TFTModel predicts nan values when MapeLoss function is used #2517