mala-project / mala

Materials Learning Algorithms. A framework for machine learning materials properties from first-principles data.
https://mala-project.github.io/mala/
BSD 3-Clause "New" or "Revised" License
82 stars 26 forks source link

Zero validation data loss during hyperparameter optimization #492

Closed nerkulec closed 3 months ago

nerkulec commented 1 year ago

When running TPE hyperparameter optimization with

parameters.hyperparameters.hyper_opt_method = "optuna"
parameters.running.after_before_training_metric = "band_energy"
...

validation data loss (during training) is always 0

Epoch 0: validation data loss: 0.000e+00, training data loss: 8.739e-07
Time for epoch[s]: 1121.9449763298035
training time: 744.3423397541046
Epoch 1: validation data loss: 0.000e+00, training data loss: 4.167e-07
Validation accuracy has not improved enough.
Time for epoch[s]: 1127.3465538024902
training time: 744.2401514053345
Epoch 2: validation data loss: 0.000e+00, training data loss: 3.374e-07
Validation accuracy has not improved enough.
...

This leads to always terminating the training after early_stopping_epochs as the validation loss is not improving.

RandomDefaultUser commented 11 months ago

Is this still a problem? If so, which data/training script was used?

The parameters.running.after_before_training_metric should not affect what is happening here, since it is only evaluated after and before the training. This must be related to the during_training metric, if the error is within the training loop. I tested this with my data and an example script on my machine and saw no problems, so maybe you could provide additional information on how to reproduce the error?