Closed qazs closed 6 years ago
Hi, I think there's a confusion. You can save/load your models after a number of epochs or updates.
If you do the first, your saved models will have the suffix _epoch_N. If you do the latter, the suffix update_N. You'll need to load the models according to what you saved
Thanks, but how do you load the model?
I'm using this python main.py
from the documentation.
As you almost did:
RELOAD = 10
RELOAD_EPOCH = True
REBUILD_DATASET = False
I'm using this config:
RELOAD = 10
RELOAD_EPOCH = True
REBUILD_DATASET = False
but didn't get any update_N_weights.h5 file. Isn't it suppose to generate a update_N_weights.h5 file?
Can you please show me the result of ls trained_models/CnTrans_encn_AttentionRNNEncoderDecoder_src_emb_32_bidir_True_enc_LSTM_32_dec_ConditionalLSTM_32_deepout_linear_trg_emb_32_Adam_0.001/*
?
Here you go, I ran 1 epoc for testing:
Config:
RELOAD = 0
RELOAD_EPOCH = True/False
REBUILD_DATASET = False/True
config.pkl epoch_1_structure_init.json epoch_1_weights_next.h5
epoch_1.h5 epoch_1_structure_next.json tensorboard_logs/
epoch_1_Model_Wrapper.pkl epoch_1_weights_init.h5
(nmt-keras)
If I set the config like below and run again I'll get the error because of the missing update_N_weights.h5 file.
RELOAD = 1
RELOAD_EPOCH = False
REBUILD_DATASET = False
If you want to load the model for EPOCH 1, you should switch the RELOAD_EPOCH
option to True
@lvapeab for reloading for nth epoch one should set RELOAD = N
?
Yes, and RELOAD_EPOCH = True
.
If one is training from scratch it should be RELOAD=0
and RELOAD_EPOCH=True
or False
?
If RELOAD=0
, the RELOAD_EPOCH
option doesn't matter.
Hi, I was trying to resume training on my trained model (epoch 10) and I get the error below, seems like it's looking for an update_weights file. How do I create this update file? I'm using the default config settings when training for the first time.
Error:
My resumed training config:
If I set
RELOAD_EPOCH = True
it would work, but then I've to increment myRELOAD
value every time, for this case setting it to a value > 10.