yxgeee / FD-GAN

[NeurIPS-2018] FD-GAN: Pose-guided Feature Distilling GAN for Robust Person Re-identification.
https://yxgeee.github.io/projects/fdgan.html
281 stars 80 forks source link

'resume' in train.py #7

Closed QingzeYin closed 5 years ago

QingzeYin commented 5 years ago

Hi, I found that you only have -- resume in baseline.py, not in train.py. There is an error about there does not exist load_checkpoint in train.py when I am doing the stage3. So I added import load_checkpoint in train.py. However, I cannot use -- resume to do the loading checkpoints part. So I wanna ask where I can add the load from checkpoint part in train.py ? Is that before for epoch or after that ? this screenshots below show the load from checkpoint part you wrote in baseline.py. Can I use this part in train.py ? screenshot from 2018-11-08 15-53-46

yxgeee commented 5 years ago

I load the pretrained model here: https://github.com/yxgeee/FD-GAN/blob/master/fdgan/model.py#L60

QingzeYin commented 5 years ago

but I cannot continue running without resume. so do you think there no need to add resume in train.py?

yxgeee commented 5 years ago

I have not tried to continue training, since I run the code in an end-to-end manner in each stage. I am not sure whether continuing running affects the final performance or not, maybe you can try it.

You can add the resume following baseline.py, but notice to save the state_dict of optimizer meanwhile, since it has influence on training.

QingzeYin commented 5 years ago

I wanted to copy the content in the last screenshot into train.py and not found optimizer here. So would you mind tell me where is the state_dict of optimizer ?

yxgeee commented 5 years ago

Directly copying the continuing training code in the baseline.py is not suitable, since we have four models in FD-GAN, please carefully refer to https://github.com/yxgeee/FD-GAN/blob/master/fdgan/model.py Notice the model in train.py is a class, not a Module in pytorch.

About how to save and load the state dict of optimizer, you can refer to https://github.com/pytorch/examples/blob/master/imagenet/main.py#L192

QingzeYin commented 5 years ago

thanks your answers. I am going to try it:D

iskenderkahramanoglu commented 4 years ago

Hi, I want to resume my training from the last checkpoint. Did you solve this problem? If you solved it, can you explain me? Thanks. @QingzeYin @yxgeee

QingzeYin commented 4 years ago

Sorry, can't do anything to help you.