How to solve the overfitting problem for TFT model?

heng94 commented 1 year ago

PyTorch-Forecasting version: 0.10.1
PyTorch version: 1.10.1+cu102
Python version: 3.9.13
Operating System: Ubuntu 18.04

Problem statement

I am using the TFT model to predict electricity consumption. But during the training process, the training loss reduces while the val loss goes up, which means the model overfits.

I tried to reduce the learning rate, batch size, hidden size, hidden_continuous_size, and some other methods to solve the overfitting problem, but it seemed not to work.

Can anyone give me some suggestions?

Thank you very much!

Here are the codes:

df0 = pd.DataFrame(pd.read_csv(cfg.file_name, header=0))
df0['week'] = df0['week'].astype(str).astype('category')
df0['month'] = df0['month'].astype(str).astype('category')
df0['season'] = df0['season'].astype(str).astype('category')
df0['holiday'] = df0['holiday'].astype(str).astype('category')
df0['workday'] = df0['workday'].astype(str).astype('category')
df0['group'] = df0['group'].astype(str).astype('category')

time_varying_known_categories = ['week', 'month', 'holiday', 'workday', 'season']

time_varying_known_reals = ['time_idx', 'inweek', 'groundpressre', 'temparature', 'rain', 'snow', 'damp']

time_varying_unknown_reals = ['data', 'seapressure',  'groundtem', 'feeltemp', 'evaporation', 'prevaporation', 
                              'nwind', 'swind', 'cloudheight', 'lcloud', 'mcloud', 'hcloud', 'tcloud', 
                              'sunintensity', 'tsuninensity', 'uvintensity']

training_cutoff = 9191
val_cutoff = 10343
# totally 11496, 0 - 9191 for training, 9192 - 10343 for val, 10344 -11496 for test
dataset = TimeSeriesDataSet(
    df0[lambda x: x.time_idx <= training_cutoff],
    group_ids=['group'],
    target='data',
    time_idx='time_idx',
    min_encoder_length=cfg.max_prediction_length // 2,
    max_encoder_length=168,  # 7*24
    min_prediction_length=1,
    max_prediction_length=24,
    time_varying_unknown_reals=time_varying_unknown_reals,
    time_varying_known_categoricals=time_varying_known_categories,
    time_varying_known_reals=time_varying_known_reals,
    static_categoricals=['group'],
    add_target_scales=True,
    add_relative_time_idx=True,
    target_normalizer=GroupNormalizer(groups=["group"], transformation="softplus"),
)
train_dataloader = dataset.to_dataloader(
  train=True, 
  batch_size=512, 
  num_workers=16, 
  pin_memory=True, 
  drop_last=False
)

test_dataset = TimeSeriesDataSet.from_dataset(
  dataset,
  df0[lambda x: x.time_idx > val_cutoff], 
  stop_randomization=True
)
test_dataloader = test_dataset.to_dataloader(
  train=False, 
  batch_size=512, 
  num_workers=16, 
  pin_memory=True, 
  drop_last=False
)

lr_logger = LearningRateMonitor()
logger = TensorBoardLogger(os.path.join(cfg.save_path, wandb.run.name, "lightning_logs"))
trainer = pl.Trainer(
    max_epochs=50,
    accelerator='gpu',
    devices=[0,
    enable_model_summary=False,
    gradient_clip_val=0.01,
    callbacks=[lr_logger],
    logger=logger,
)

tft = TemporalFusionTransformer.from_dataset(
    dataset,
    learning_rate=0.0008,
    hidden_size=64,
    lstm_layers=2,
    attention_head_size=3,
    dropout=0.4,
    hidden_continuous_size=8,
    output_size=7, 
    loss=QuantileLoss(),
    logging_metrics=nn.ModuleList([
      SMAPE(reduction='sqrt-mean'), 
      MAE(reduction='mean'), 
      RMSE(reduction='sqrt-mean'), 
      MAPE(reduction='sqrt-mean')
    ]),
    reduce_on_plateau_patience=4,
    weight_decay=float(1e-3),
    optimizer='adamw,
)

sairamtvv commented 1 year ago

Did you also check by changing max_encoder_length. I mean reduce the max_encoder length. sometimes, that helped in reducing the overfitting. Just FYI, Even I get similar plots as you have shown. The validation loss increases drastically after few epochs.

heng94 commented 1 year ago

@sairamtvv Thanks for your reply! Setting max_encoder_length as 7*24 is the way in the papers I read. I found it's a typical setting for electricity consumption prediction. I am not sure whether it works or not. I will try to reduce it and then show the results here. Thank you!

heng94 commented 1 year ago

Setting max_encoder_length as 24 does not work either. Here are the plots Still need to find other ways.

sairamtvv commented 1 year ago

In the plot you showed above, atleast the validation loss has gone down from 0.85 to 0.55. Pls correct me incase I am wrong

heng94 commented 1 year ago

Yes, it rises up again. And the result is still not good. Any way to set the sliding window size to more than 1?
such as 1-192, 25-217, 49-241 Thank you!

sairamtvv commented 1 year ago

@heng94 did you realize what could be the problem

sairamtvv commented 1 year ago

https://www.freecodecamp.org/news/handling-overfitting-in-deep-learning-models/

sktime / pytorch-forecasting

How to solve the overfitting problem for TFT model? #1218

Problem statement