optimize_hyperparameters appears to hang – no output

jambudipa commented 1 year ago

PyTorch-Forecasting version: 1.0.0
PyTorch version: Not sure!
Python version: 3.10
Operating System: macOS 14.0

I am looking at the following tutorial:

https://pytorch-forecasting.readthedocs.io/en/latest/tutorials/stallion.html

And I am running the following hyperparameter optimization code:

study = optimize_hyperparameters(
    train_dataloader,
    val_dataloader,
    model_path="optuna_test",
    n_trials=200,
    max_epochs=50,
    gradient_clip_val_range=(0.01, 1.0),
    hidden_size_range=(8, 128),
    hidden_continuous_size_range=(8, 128),
    attention_head_size_range=(1, 4),
    learning_rate_range=(0.001, 0.1),
    dropout_range=(0.1, 0.3),
    trainer_kwargs=dict(limit_train_batches=30),
    reduce_on_plateau_patience=4,
    verbose=True,
    use_learning_rate_finder=False,  # use Optuna to find ideal learning rate or use in-built learning rate finder
)

...but I am concerned it is not actually doing anything!

Since it is running on macOS, it is having problems. Here is a selection of the console output:

FutureWarning: suggest_uniform has been deprecated in v3.0.0. This feature will be removed in v6.0.0. See https://github.com/optuna/optuna/releases/tag/v3.0.0. Use suggest_float instead.

UserWarning: Attribute 'loss' is an instance of nn.Module and is already saved during checkpointing. It is recommended to ignore them using self.save_hyperparameters(ignore=['loss']).

UserWarning: Attribute 'logging_metrics' is an instance of nn.Module and is already saved during checkpointing. It is recommended to ignore them using self.save_hyperparameters(ignore=['logging_metrics']).

FutureWarning: suggest_loguniform has been deprecated in v3.0.0. This feature will be removed in v6.0.0. See https://github.com/optuna/optuna/releases/tag/v3.0.0. Use suggest_float(..., log=True) instead.

But the last message persists and is the most concerning:

UserWarning: MPS: no support for int64 reduction ops, casting it to int32 (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/mps/operations/ReduceOps.mm:144.)

Should I allow it to continue? Even with verbose=True, it is not really outputting anything other than errors. When I do manage to stop it running – which is difficult – I do get the results of one trial run, so perhaps it is doing something!

jambudipa commented 1 year ago

Ok, drama over, I reduced the size of the dataset and it has completed 3 trials so far. I just have to wait a few days!

It doesn't like my setting the num_workers though, complains about some re-entry into main which I don't quite get.

manitadayon commented 1 year ago

num_workers greater than 0 gives some runtime error, but that should not stop your training or hyper parameter tuning. It actually works you can try num_workers 0 and 3 and notice the speed difference, but yes beyond certain value there is no speed difference.

sktime / pytorch-forecasting

optimize_hyperparameters appears to hang – no output #1363