Closed HamDan1999 closed 2 years ago
I currently have three ".h5" files and one ".json file"
Technically, you can get it back by setting baisc_model=None
and the initial_epoch
for tt.train
and remaining epochs for sch
.
import losses, train, models
data_basic_path = '/datasets/ms1m-retinaface-t1'
data_path = data_basic_path + '_112x112_folders'
eval_paths = [os.path.join(data_basic_path, ii) for ii in ['lfw.bin', 'cfp_fp.bin', 'agedb_30.bin']]
# basic_model = models.buildin_models("ghostnet", dropout=0, emb_shape=512, output_layer='GDC', bn_momentum=0.9, bn_epsilon=1e-5)
# basic_model = models.add_l2_regularizer_2_model(basic_model, weight_decay=5e-4, apply_to_batch_normal=False)
# basic_model = models.replace_ReLU_with_PReLU(basic_model)
basic_model = None # >>>> 1st: set basic_model as None
tt = train.Train(data_path, eval_paths=eval_paths,
save_path='TT_ghostnet_prelu_GDC_arc_emb512_dr0_sgd_l2_5e4_bs1024_ms1m_bnm09_bne1e5_cos16_batch_fixed.h5',
basic_model=basic_model, model=None, lr_base=0.1, lr_decay=0.5, lr_decay_steps=16, lr_min=1e-5,
batch_size=1024, random_status=0, eval_freq=2000, output_weight_decay=1
)
# optimizer = keras.optimizers.SGD(learning_rate=0.1, momentum=0.9)
sch = [
# {"loss": losses.ArcfaceLoss(scale=32), "epoch": 1, "optimizer": optimizer},
{"loss": losses.ArcfaceLoss(scale=64), "epoch": 28}, # >>>> 2nd: set remaining epochs 50 - 22
]
tt.train(sch, 22) # >>>> 3rd: set initial_epoch 22
When basic_model=None, model=None
, the script will try to reload model from checkpoints/{save_path}
, or you can spesific model as model="checkpoints/TT_ghostnet_prelu_GDC_arc_emb512_dr0_sgd_l2_5e4_bs1024_ms1m_bnm09_bne1e5_cos16_batch_fixed.h5"
Hi. I just tried the above script and it is training, I will report back once I get validation results.
Thanks a lot for your help.
It is working, thanks for your help.
Hey, I hope you are doing well.
My code crashed on the 22nd epoch while running one of your models shown below:
import losses, train, models data_basic_path = '/datasets/ms1m-retinaface-t1' data_path = data_basic_path + '_112x112_folders' eval_paths = [os.path.join(data_basic_path, ii) for ii in ['lfw.bin', 'cfp_fp.bin', 'agedb_30.bin']]
basic_model = models.buildin_models("ghostnet", dropout=0, emb_shape=512, output_layer='GDC', bn_momentum=0.9, bn_epsilon=1e-5) basic_model = models.add_l2_regularizer_2_model(basic_model, weight_decay=5e-4, apply_to_batch_normal=False) basic_model = models.replace_ReLU_with_PReLU(basic_model)
tt = train.Train(data_path, eval_paths=eval_paths, save_path='TT_ghostnet_prelu_GDC_arc_emb512_dr0_sgd_l2_5e4_bs1024_ms1m_bnm09_bne1e5_cos16_batch_fixed.h5', basic_model=basic_model, model=None, lr_base=0.1, lr_decay=0.5, lr_decay_steps=16, lr_min=1e-5, batch_size=1024, random_status=0, eval_freq=2000, output_weight_decay=1)
optimizer = keras.optimizers.SGD(learning_rate=0.1, momentum=0.9) sch = [ {"loss": losses.ArcfaceLoss(scale=32), "epoch": 1, "optimizer": optimizer}, {"loss": losses.ArcfaceLoss(scale=64), "epoch": 48}, ] tt.train(sch, 0)
How can I resume training from the previously saved model, like can you show me or write an example of resume training? (Note I tried "Restore training from break point" section but it did not work with me, it seems that I have messed it up somewhere)