jdb78 / pytorch-forecasting

Time series forecasting with PyTorch
https://pytorch-forecasting.readthedocs.io/
MIT License
3.74k stars 599 forks source link

MultiHorizonMetric weighted loss not working #942

Open clianga opened 2 years ago

clianga commented 2 years ago

Expected behavior

I'm trying to apply a weight to each sample (and losses) to get rid of the covid effect. Say I have 100 time series from 2016 to 2022, and I want the model not update parameters from 2020 Mar 1st to 2020 July 30th. Hence I created a weight in my panda DataFrame, and let the weight = 0 when time is between 2020 Mar 1st to 2020 July 30th, and 1 elsewhere, for all 100 time series. The dataset creating code is here:

from pytorch_forecasting import TimeSeriesDataSet
from pytorch_lightning.callbacks import EarlyStopping, LearningRateMonitor
from pytorch_lightning.loggers import TensorBoardLogger

context_length = 365
prediction_length = 60
training_cutoff = tot_data["date"].max() - prediction_length

training = TimeSeriesDataSet(
    tot_data[lambda x: x.date <= training_cutoff],
    group_ids= ['combined_group'],
    target='contact',
    weight = 'weight',
    time_idx='date',
    static_categoricals = ['marketplace_id', 'pg_rollup', 'order_channel'],
    time_varying_known_reals = numeric_col,
    time_varying_unknown_reals=["contact"],
    lags = {'contact': list(range(1, 31))},
    min_encoder_length = context_length,
    max_encoder_length = context_length*2,
    min_prediction_length = prediction_length,
    max_prediction_length = prediction_length,
)

# create validation set (predict=True) which means to predict the last max_prediction_length points in time
# for each series
validation = TimeSeriesDataSet.from_dataset(training, tot_data, predict=True, stop_randomization=True)

# create dataloaders for model
batch_size = 128  # set this between 32 to 128
train_dataloader = training.to_dataloader(train=True, batch_size=batch_size)
val_dataloader = validation.to_dataloader(train=False, batch_size=batch_size * 10)

trainx, trainy = next(iter(train_dataloader))
display(trainy[0].size())
testx, testy = next(iter(train_dataloader))
display(testy[0].size())

I check the tensor size for training and validation, it matches. Then I run code

# Weighted loss
from pytorch_forecasting.metrics import MultiHorizonMetric
class MAE(MultiHorizonMetric):
    def loss(self, y_pred, target):
        loss = (self.to_prediction(y_pred) - target).abs()
        return loss

import pytorch_lightning as pl
# res.suggestion()

# configure network and trainer
early_stop_callback = EarlyStopping(monitor="val_loss", min_delta=1e-4, patience=10, verbose=False, mode="min")
lr_logger = LearningRateMonitor()  # log the learning rate
logger = TensorBoardLogger("lightning_logs")  # logging results to a tensorboard

trainer = pl.Trainer(
    max_epochs= 200,
    auto_lr_find = True,
    auto_scale_batch_size = True,
    gpus=1,
#     weights_summary="top",
    gradient_clip_val=0.1,
    limit_train_batches=30,  # coment in for training, running valiation every 30 batches
    # fast_dev_run=True,  # comment in to check that networkor dataset has no serious bugs
    callbacks=[lr_logger, early_stop_callback],
    logger=logger,
)

lstm_net = RecurrentNetwork.from_dataset(
    dataset = training,
    cell_type = 'LSTM', 
    hidden_size = 128, 
    rnn_layers = 2, 
    dropout = 0.1,
    loss = MAE(), 
)
print(f"Number of parameters in network: {lstm_net.size()/1e3:.1f}k")

to create a model and it went through without error. However when I try to train the model using code:

# fit network
trainer.fit(
    lstm_net,
    train_dataloaders=train_dataloader,
    val_dataloaders=val_dataloader,
)

It produces an error

~/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/pytorch_forecasting/metrics.py in update(self, y_pred, target, encoder_target, encoder_lengths) 854 # weight samples 855 if weight is not None: --> 856 losses = losses * weight.unsqueeze(-1) 857 858 self._update_losses_and_lengths(losses, lengths)

RuntimeError: The size of tensor a (119) must match the size of tensor b (60) at non-singleton dimension 1

Can you tell me how to fix it? Or is there a way to apply weight to the losses (for different samples different weight is applied). Thank you!

fnavruzov commented 2 years ago

I've encountered the same RuntimeError with shape mismatch when using AggregationMetric(some_metric) as loss. To my understanding, it was connected with the predictions and actuals shape mismatch:

Can you check whether original MAE() works for you? if no - as a quick workaround I suggest adding in loss method last axis averaging:

def loss(self, y_pred, target):
        y_pred = self.to_prediction(y_pred)
        if y_pred.ndim == 3:
            # maybe some other checks
            y_pred = y_pred.mean(axis=-1)
        loss = (y_pred - target).abs()
        return loss

Hope it will help you to solve the issue