Closed Golbstein closed 3 years ago
This is actually a larger issue - any callbacks that have a notion of state
are lost when resuming.
I tried following workaround, it looks good so far. Save your best loss from last training , add init the best in ModelCheckPoint callback
class ModelCheckpointWrapper(ModelCheckpoint):
def __init__(self, best_init=None, *arg, **kwagrs):
super().__init__(*arg, **kwagrs)
if best_init is not None:
self.best = best_init
checkpointer = ModelCheckpointWrapper(best_init=BEST_LOSS, filepath=saved_model, verbose=1, save_best_only=True)
And I use csvlogger to save my training results like this
if os.path.exists(saved_log):
with open(saved_log) as f:
reader = csv.DictReader(f)
rows = [row for row in reader]
EPOCH_INIT = int(rows[-1]['epoch'])+1
BEST_LOSS = float(rows[-1]['val_loss'])
print('Resume training from EPOCH_INIT {0} ,BEST_LOSS {1}, BEST_ACC {2}'.format(EPOCH_INIT, BEST_LOSS, rows[-1]['val_acc']))
csvlogger = callbacks.CSVLogger(saved_log, separator=',', append=True)
model.fit_generator(
train_generator,
......
callbacks = [checkpointer, csvlogger],
initial_epoch=EPOCH_INIT)
@cloudseasail Thanks for your code. But I think in your code, BEST_LOSS
may NOT be the REALLY BEST LOSS. float(rows[-1]['val_loss'])
is just the last loss value saved in csvlog file. It is not the Best_loss
. So we could select the BEST_LOSS
by the following code:
import pandas as pd
if os.path.exists(saved_log):
df = pd.read_csv(saved_log)
EPOCH_INIT =df.epoch[-1] + 1
BEST_LOSS = min(float(df.val_loss))
print('Retraining from EPOCH_INIT {0} ,BEST_LOSS {1}'.format(EPOCH_INIT, BEST_LOSS))
Hey there,
I'm training my model with the following callback:
I've trained the model for 50 epochs and saved the weights of the best "IOU" score.
Now I want to load the model with these weights and keep monitoring the checkpoint using my best IOU score.
Example: in epoch 47 I've gotten the best IOU score of 0.6 When I initialize the model with the weights of this epoch and starts training, it saves the weights of the first epoch even though the IOU decreases to 0.58 but it thinks it has improved from -inf