How to be sure that training continue from last training after break the training?

yanx27 / EverybodyDanceNow_reproduce_pytorch

Everybody dance now reproduced in pytorch

MIT License

607 stars 172 forks source link

How to be sure that training continue from last training after break the training? #69

Open talatccan opened 5 years ago

talatccan commented 5 years ago

Hi,

Im trying to continue training from last saved model after break the training. At the first i started training and saved first epoch in checkpoints and stopped it. Afterall i set load_pretrain = './checkpoints/target/ in ./src/config/train_opt.py and started training again but it started from epoch 1 like before. I was thinking it will continue from epoch 2.

How can i be sure its continuing from the last saved epoch?

andrewhani14 commented 4 years ago

Same issue , need help

zibozzb commented 4 years ago

Same issue. Did you guys figure it out?

iluvrachel commented 4 years ago

So, the problem is here if you print out those value you will figure out that this line won't work while training so i just assign the pretrained_path right before this statement manually and it eventually work.

zibozzb commented 4 years ago

So, the problem is here if you print out those value you will figure out that this line won't work while training so i just assign the pretrained_path right before this statement manually and it eventually work.

Thank you for your reply. I tried to print them out, however, the value of "pretrained_path" is "./checkpoints/target/" which is correct I guess. The issue is the training will start from epoch 1 rather than the results of the last training. I am not sure if it will continue training. In addition, a new log file (in ./checkpoints/target/logs) will be created rather than keep updating the previous log file.

ShawnDong98 commented 3 years ago

In the train opt file, there are two 'load_pretrain' args, you should delete one.

In additon, although the loaded model is correct, the print log is still from 1 start, maybe you can change the 'start_epoch' variable in the 'train_pose2vid.py'.