unit8co / darts

A python library for user-friendly forecasting and anomaly detection on time series.
https://unit8co.github.io/darts/
Apache License 2.0
7.97k stars 869 forks source link

[QUESTION]Training Loss Much Lower Than Validation Loss in TSMixerModel: Need Help Understanding Why #2558

Open erl61 opened 3 hours ago

erl61 commented 3 hours ago

Issue I am training a TSMixerModel to forecast multivariate time series. The model performs well overall, but I notice that the training loss is consistently much lower than the validation loss (sometimes by orders of magnitude).

I have already tried different loss functions (MAELoss, MapeLoss), and the issue persists. However, when I forecast using this model, I don’t observe signs of overfitting, and the model predictions look good.

Callback I use the following setup for logging the losses:

class LossLogger(Callback):
    def __init__(self):
        self.train_loss = []
        self.val_loss = []

    # will automatically be called at the end of each epoch
    def on_train_epoch_end(self, trainer: "pl.Trainer", pl_module: "pl.LightningModule") -> None:
        self.train_loss.append(float(trainer.callback_metrics["train_loss"]))

    def on_validation_epoch_end(self, trainer: "pl.Trainer", pl_module: "pl.LightningModule") -> None:
        if not trainer.sanity_checking:
            self.val_loss.append(float(trainer.callback_metrics["val_loss"]))

loss_logger = LossLogger()

Model This is how I initialize the model:

progress_bar = TFMProgressBar(enable_sanity_check_bar=False, enable_validation_bar=False)

limit_train_batches = 50
limit_val_batches = 50
max_epochs = 30
batch_size = 64

model_tsm = TSMixerModel(
    input_chunk_length=49,  
        output_chunk_length=130, 
        use_reversible_instance_norm=True,
        optimizer_kwargs={"lr": 1e-4},
        nr_epochs_val_period=1, 
        pl_trainer_kwargs={"gradient_clip_val": 1,
                            "max_epochs": max_epochs,
                            "limit_train_batches": limit_train_batches,
                            "limit_val_batches": limit_val_batches,
                            "accelerator": "auto",
                            "callbacks": [progress_bar, loss_logger]},
        lr_scheduler_cls=torch.optim.lr_scheduler.ExponentialLR,
        lr_scheduler_kwargs={"gamma": 0.999},
        likelihood=QuantileRegression(), 
        loss_fn=None, 
        save_checkpoints=True, 
        force_reset=True,
        batch_size=64,
        random_state=42,
        add_encoders={"cyclic": {"future": ['month', 'day', 'weekday','quarter', 'dayofyear', 'week']}},
        use_static_covariates=True,
        model_name="tsm")

Loss curves Here are the plotted loss curves after training:

loss_df = pd.DataFrame({'epoch':range(0, len(model_tsm.trainer.callbacks[1].train_loss)),
                        'train_loss':model_tsm.trainer.callbacks[1].train_loss,
                        'val_loss':model_tsm.trainer.callbacks[1].val_loss})

plt.plot(loss_df['epoch'],
         loss_df['train_loss'], color='blue', label='train loss: ' + str(loss_df['train_loss'][-1:].item()))

plt.plot(loss_df['epoch'],
         loss_df['val_loss'], color='orange', label='val loss: ' + str(loss_df['val_loss'][-1:].item()))

plt.gcf().set_size_inches(10, 5)
plt.legend()
plt.show()

image

Data I create my multivariate time series using from_group_dataframe() as follows:

ts_df = TimeSeries.from_group_dataframe(df, group_cols=['group1', 'group1', 'group1'],
                                time_col='ds', value_cols='y', freq='D')

Question Why is my training loss significantly lower than the validation loss, sometimes by orders of magnitude? Could it be related to how the data is structured as a list of time series? Is this expected behavior in this scenario, or could there be an issue with scaling or loss calculation?

I appreciate any help or insights!

Thanks!

dennisbader commented 2 hours ago

Hi @erl61, could you provide a minimal reproducible example including model training (potentially processing of the data), what series you provide to fit and predict?