sktime / pytorch-forecasting

Time series forecasting with PyTorch
https://pytorch-forecasting.readthedocs.io/
MIT License
3.92k stars 620 forks source link

Adding early stopping callbacks to the hyperparameter optimization of TFT #1193

Open Metaming opened 1 year ago

Metaming commented 1 year ago

Expected behavior

I use the optimize_hyperparameters from pytorch_forecasting.models.temporal_fusion_transformer.tuning, following the tutorial example fro p. I tested it with the same single set of hyperparameters I used when running a simple trainer.fit, expecting it should finish training in a similar amount of time (~30min)

Actual behavior

But the optimize_hyperparameters took a much longer time (couples of hours). I then looked into the source code temporal_fusion_transformer\tuning.py , and notice that the dictionary default_trainner_kwargs in the function objective does not have earlystopping callback in the callbacks item. I wonder if this is the reason that the hyperparameter optimization took a much longer time.

I then tried to add early stopping in the trainer_kwargs of the optimize_hyperparameters settings, the trainning did run and finished much faster, but at the end it returns a syntax error in "return metrics_callback.metrics[-1]["val_loss"].item()".

I think it's because in the source code, the default_trainer_kwargs is updated by using default_trainer_kwargs.update(trainer_kwargs). _Since callbacks is an item in the dictionary of default_trainer_kwargs, the way it is updated now will also remove any initial default callbacks (e.g.metricscallback) from the callbacks list. If the user only want to add an early_stop_callbacks via the trainner_kwargs from the optimize_hyperparameters by setting trainner_kwargs = dict(callbacks=[early_stop_callbacks]), it will remove the initial default callbacks from the list ( [metrics_callback, learning_rate_callback,checkpoint_callback,PyTorchLightningPruningCallback(trial, monitor="val_loss")] ). Since I only use trainner_kwargs = dict(callbacks=[early_stop_callbacks]), the default metrics_callback is removed after update. However, the metrics_callbacks is defined within the source tuning.py that refer to trainer.callback_metrics, it's not clear to me how to do it properly from outside since trainer is also referred internally within the tuning.py.

May I suggest to either add the early stopping callbacks to the default_trainer_kwargs in the source, or adjust the way the callbacks can be updated?

For temporary fixing, I just change the source code by adding early stopping callbacks to the default_trainer_kwargs, it seems working fine.

Code to reproduce the problem

Paste the command(s) you ran and the output. Including a link to a colab notebook will speed up issue resolution. If there was a crash, please include the traceback here. The code used to initialize the TimeSeriesDataSet and model should be also included.

grosestq commented 1 year ago

Hi,

unfortunately i cant see your code.

I faced a similar problem:

optimize_hyperparameters() will not use early-stopping by default. If i add early-stopping in the trainer_kwargs it works fine. The problem is, the val_loss value from the run will be used for all further trials, leads to a pruning right from the start, what is not wanted.

Furthermore, after the recent Optuna update the pruning of trials seems not working anymore, but thats a different issue...