Closed tqdungctuct closed 3 years ago
Hi, what do you mean by "create latest checkpoint for every epoch" do you want to save a checkpoint after every epoch? or do you want to save after the last epoch of the whole training?
My idea is latest checkpoint will be overwrited after next epoch is done. If change save period to 1, the problem is many save file is created and it take a lot of space. I also want to have the checkpoint after 10 epochs for example (can set save period through config).
Sorry, I still do not get it, what do you mean by "My idea is latest checkpoint will be overwrited after next epoch is done" the template by default saves every save_period as defined in the config for example if you train for 100 epochs with save_period of 10 it will save at {10,20,30,40,50,60,70,80,90,100}. Could you show me an example of how often do you want to save I mean the number of epochs?
Sorry, I still do not get it, what do you mean by "My idea is latest checkpoint will be overwrited after next epoch is done" the template by default saves every save_period as defined in the config for example if you train for 100 epochs with save_period of 10 it will save at {10,20,30,40,50,60,70,80,90,100}. Could you show me an example of how often do you want to save I mean the number of epochs?
For example, after epoch 1 complete, it will create latest.pth, model_best.pth. After epoch 2, new latest.pth will create and overwrite the old one, model_best.pth .... After epoch 10, checkpoint_epoch10.pth will be created, also latest.pth and model_best.pth created too
I have implemented this function a while ago, in my local fork(or latest version of hydra-DDP
branch) of this project. You can check the code here
Thank @SunQpark , I'm using deblurGAN by fourson (https://github.com/fourson/DeblurGAN-pytorch) and he use this template, so I have to modify in this template. Can you show me how to implemented this fuction in this template? Thank you
@tqdungctuct checkpoint saving logic is basically identical in this template and in my version. Maybe you can modify (this part) in your code, to be same as in my version above.
Thank @SunQpark. I will try to do this
I want to create latest checkpoint after every epochs beside save period in config file. For example, I already set save period every 10 epochs but I still want to create latest checkpoint for every epoch. Can you guide me to do that? Thank you