Issue
I am training a TSMixerModel to forecast multivariate time series. The model performs well overall, but I notice that the training loss is consistently much lower than the validation loss (sometimes by orders of magnitude).
I have already tried different loss functions (MAELoss, MapeLoss), and the issue persists. However, when I forecast using this model, I don’t observe signs of overfitting, and the model predictions look good.
Callback
I use the following setup for logging the losses:
class LossLogger(Callback):
def __init__(self):
self.train_loss = []
self.val_loss = []
# will automatically be called at the end of each epoch
def on_train_epoch_end(self, trainer: "pl.Trainer", pl_module: "pl.LightningModule") -> None:
self.train_loss.append(float(trainer.callback_metrics["train_loss"]))
def on_validation_epoch_end(self, trainer: "pl.Trainer", pl_module: "pl.LightningModule") -> None:
if not trainer.sanity_checking:
self.val_loss.append(float(trainer.callback_metrics["val_loss"]))
loss_logger = LossLogger()
Question
Why is my training loss significantly lower than the validation loss, sometimes by orders of magnitude? Could it be related to how the data is structured as a list of time series? Is this expected behavior in this scenario, or could there be an issue with scaling or loss calculation?
Hi @erl61, could you provide a minimal reproducible example including model training (potentially processing of the data), what series you provide to fit and predict?
Issue I am training a TSMixerModel to forecast multivariate time series. The model performs well overall, but I notice that the training loss is consistently much lower than the validation loss (sometimes by orders of magnitude).
I have already tried different loss functions (MAELoss, MapeLoss), and the issue persists. However, when I forecast using this model, I don’t observe signs of overfitting, and the model predictions look good.
Callback I use the following setup for logging the losses:
Model This is how I initialize the model:
Loss curves Here are the plotted loss curves after training:
Data I create my multivariate time series using from_group_dataframe() as follows:
Question Why is my training loss significantly lower than the validation loss, sometimes by orders of magnitude? Could it be related to how the data is structured as a list of time series? Is this expected behavior in this scenario, or could there be an issue with scaling or loss calculation?
I appreciate any help or insights!
Thanks!