Serge9744 commented 1 year ago

Hi,

The optimization didn't work. So I just added the line :

"metrics_callback.on_validation_end( trainer)" after line 209 .

I also modified the class :

class MetricsCallback(Callback): """PyTorch Lightning metric callback."""

def __init__(self):
    super().__init__()
    self.metrics = []

def on_validation_end(self, trainer):
    self.metrics.append(trainer.callback_metrics

By suppressing the pl argument in the on validation signature which didn't serve

I have a weird beahviour though , as the best parameters study val loss doesn't correspond to the one I train without the optimization .

For examples during experiments I have:

Q_Loss = QuantileLoss([0.05,0.5,0.95])

checkpoint_callback = ModelCheckpoint( dirpath=".", filename="best-checkpoint", save_top_k=1, verbose=True, monitor="val_loss", mode="min" ) early_stop_callback = EarlyStopping(monitor="val_loss", min_delta=1e-4, patience=10, verbose=False, mode="min")

from pytorch_forecasting.models.temporal_fusion_transformer.tuning import optimize_hyperparameters

create study

study = optimize_hyperparameters( train_dataloader, val_dataloader, model_path="optuna_test", n_trials=200, max_epochs=200, gradient_clip_val_range=(0.01, 1.0), hidden_size_range=(8, 512), hidden_continuous_size_range=(8, 512), attention_head_size_range=(1, 4), lstm_layers_range=(1,8), learning_rate_range=(1e-6, 0.1), dropout_range=(0.1, 0.3), trainer_kwargs=dict(limit_train_batches=30,callbacks=[early_stop_callback, checkpoint_callback], ), output_size=3, # 7 quantiles by default loss=Q_Loss, reduce_on_plateau_patience=4, use_learning_rate_finder=False, # use Optuna to find ideal learning rate or use in-built learning rate finder )

Trial 125 finished with value: 3930406.75 and parameters: {'gradient_clip_val': 0.15578493164317458, 'hidden_size': 69, 'lstm_layers': 5, 'dropout': 0.2761985832861955, 'hidden_continuous_size': 32, 'attention_head_size': 4, 'learning_rate': 0.07180574537769648}. Best is trial 125 with value: 3930406.75.

However when I train it with same hyper parameters:

early_stop_callback = EarlyStopping(monitor="val_loss", min_delta=1e-4, patience=10, verbose=False, mode="min") lr_logger = LearningRateMonitor() # log the learning rate logger = TensorBoardLogger("lightning_logs") # logging results to a tensorboard

trainer = pl.Trainer( max_epochs=200, gpus=0, enable_model_summary=True, gradient_clip_val=best_params['gradient_clip_val'],

limit_train_batches=10, # coment in for training, running valiation every 30 batches

# fast_dev_run=True,  # comment in to check that networkor dataset has no serious bugs
callbacks=[ early_stop_callback],
logger=logger,
log_every_n_steps = 2)

tft = TemporalFusionTransformer.from_dataset( training,

not meaningful for finding the learning rate but otherwise very important

learning_rate=best_params['learning_rate'],
hidden_size= best_params['hidden_size'],  # most important hyperparameter apart from learning rate
# number of attention heads. Set to up to 4 for large datasets
attention_head_size=best_params['attention_head_size'],
lstm_layers = best_params['lstm_layers'],
dropout= best_params['dropout'],  # between 0.1 and 0.3 are good values
hidden_continuous_size= best_params['hidden_continuous_size'] ,  # set to <= hidden_size
output_size=3,  # 7 quantiles by default
loss=Q_Loss,
log_interval=10,
# reduce learning rate if no improvement in validation loss after x epochs
reduce_on_plateau_patience=4,

)

I have : Epoch 27: 100% 39/39 [00:12<00:00, 3.23it/s, loss=2.76e+06, v_num=29, train_loss_step=2.89e+6, val_loss=1.06e+7, train_loss_epoch=2.78e+6]

sairamtvv commented 1 year ago

Your loss looks to be very big. Is ur normalization done properly. Just a side comment

Serge9744 commented 1 year ago

Hi,

Yes for the exogenous variables I used 👍 from sklearn.preprocessing import StandardScaler

sc =StandardScaler()

df_train[[col for col in exog_var if col != 'crisis' ]] = sc.fit_transform(df_train[[col for col in exog_var if col != 'crisis' ]])

Then for the target directly in the TimeSerieDataSet. Constant is the name of the column with 1 in every row as there are no specific groups.

MaybeI got it wrong

training = TimeSeriesDataSet( df_train.loc[:,[endog_var] +["time_index","constant"]+ exog_var], time_idx="time_index", target=endog_var, group_ids=["constant"], min_encoder_length=max_encoder_length ,
max_encoder_length=max_encoder_length, min_prediction_length=max_prediction_length, max_prediction_length=max_prediction_length,

time_varying_unknown_categoricals=["crisis"],

time_varying_unknown_reals= [endog_var]+[col for col in exog_var if col != 'crisis' ],
target_normalizer=TorchNormalizer(),

add_relative_time_idx=True,
add_target_scales=True,
add_encoder_length=True,

)

Metaming commented 1 year ago

Hi @Serge9744 , can you share how you retrieve the best model with the optimal hyperparameters after hyper parameter study. Can it be retrieved from 'study'?

sktime / pytorch-forecasting

Optuna hyper parameters doesn't work #1275

from pytorch_forecasting.models.temporal_fusion_transformer.tuning import optimize_hyperparameters

create study

limit_train_batches=10, # coment in for training, running valiation every 30 batches

not meaningful for finding the learning rate but otherwise very important