victoresque / pytorch-template

PyTorch deep learning projects made easy.
MIT License
4.7k stars 1.08k forks source link

Latest checkpoint #95

Closed tqdungctuct closed 3 years ago

tqdungctuct commented 3 years ago

I want to create latest checkpoint after every epochs beside save period in config file. For example, I already set save period every 10 epochs but I still want to create latest checkpoint for every epoch. Can you guide me to do that? Thank you

MohamedA95 commented 3 years ago

Hi, what do you mean by "create latest checkpoint for every epoch" do you want to save a checkpoint after every epoch? or do you want to save after the last epoch of the whole training?

tqdungctuct commented 3 years ago

My idea is latest checkpoint will be overwrited after next epoch is done. If change save period to 1, the problem is many save file is created and it take a lot of space. I also want to have the checkpoint after 10 epochs for example (can set save period through config).

MohamedA95 commented 3 years ago

Sorry, I still do not get it, what do you mean by "My idea is latest checkpoint will be overwrited after next epoch is done" the template by default saves every save_period as defined in the config for example if you train for 100 epochs with save_period of 10 it will save at {10,20,30,40,50,60,70,80,90,100}. Could you show me an example of how often do you want to save I mean the number of epochs?

tqdungctuct commented 3 years ago

Sorry, I still do not get it, what do you mean by "My idea is latest checkpoint will be overwrited after next epoch is done" the template by default saves every save_period as defined in the config for example if you train for 100 epochs with save_period of 10 it will save at {10,20,30,40,50,60,70,80,90,100}. Could you show me an example of how often do you want to save I mean the number of epochs?

For example, after epoch 1 complete, it will create latest.pth, model_best.pth. After epoch 2, new latest.pth will create and overwrite the old one, model_best.pth .... After epoch 10, checkpoint_epoch10.pth will be created, also latest.pth and model_best.pth created too

SunQpark commented 3 years ago

I have implemented this function a while ago, in my local fork(or latest version of hydra-DDP branch) of this project. You can check the code here

tqdungctuct commented 3 years ago

Thank @SunQpark , I'm using deblurGAN by fourson (https://github.com/fourson/DeblurGAN-pytorch) and he use this template, so I have to modify in this template. Can you show me how to implemented this fuction in this template? Thank you

SunQpark commented 3 years ago

@tqdungctuct checkpoint saving logic is basically identical in this template and in my version. Maybe you can modify (this part) in your code, to be same as in my version above.

tqdungctuct commented 3 years ago

Thank @SunQpark. I will try to do this