Open YoojLee opened 2 years ago
These are two separate questions: (1) should we optimize G and D jointly or not? (2) If we optimize G and D separately, do we need to compute gradients for D while updating G.
For (2), As long as we don't do optimizer_D.step(), the gradients for D will not be used in SGD. Therefore, we set require False in Line 185.
For (1), most authors optimize them separately, following the original paper's practice.
@junyanz Thank you for your reply! Still, I am wondering why we set require False in Line 185. Is it because it is mandatory or it is for any other purpose such as speed-up or something?
https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix/blob/003efc4c8819de47ff11b5a0af7ba09aee7f5fc1/models/cycle_gan_model.py#L185
Thanks for the nice work! I am quite confused that freezing D when optimizing G is just for a speedup (according to a reply to the previous issues of this topic). I thought it was quite important to freeze D when optimizing G since G and D should be isolated from each other in the optimization process. Does it really have nothing to do with the "performance" of training? I would like to know that the code I mentioned was written for the purpose of mere speed-up of training process.
Thanks!