Open q-55555 opened 4 years ago
I was just investigating this. It appears that the model saved in epoch_0 is actually the best model and that best_step is simply not being recorded properly. To be clear I believe tuner.get_best_models(1)[0] returns the best, fully trained model from the sweep.
I'm fairly sure this is a bug. Checkpoints should be saved after every epoch, for some reason the tuner's on_epoch_end
is being called with epoch = 0 always but other callbacks get the correct epoch number so it seems like the issue is somewhere in the tuner code, I just couldn't figure out where
Actually I see now that this is intentional, the BayesianOptimization tuner overrides the base tuner's checkpoint behavior and keeps only the best checkpoint
Wondering if there is a way to override this to checkpoint on every epoch? That to me seems like the expected behavior.
Currently, you cannot override to checkpoint every epoch. However, you can use TensorBoard to visualize the learning curve.
Actually I see now that this is intentional, the BayesianOptimization tuner overrides the base tuner's checkpoint behavior and keeps only the best checkpoint
Hi @ysgit, I am running into the same issue. Do you happen to have a link to the source saying that the checkpoint is overwritten with the best model?
I have noticed that only "epoch_0" is saved in checkpoints. Is this something normal ? Or is there something to do in order to save the best epoch (and not only epoch_0) ?
My point is that I would like to save the best epoch during training but calling "trial.best_step" after having run the trial always returns 'None'.
Thank you !