Open nuneslu opened 6 years ago
I would like to know this too.
I think the idea is that discrimination is an easier task than generation. SGD limits discriminator optimization allowing the generator using the faster Adam to "catch up". From my experience SGD optimization is very uniform since lr is mostly fixed compared to Adam.
I think the idea is that discrimination is an easier task than generation. SGD limits discriminator optimization allowing the generator using the faster Adam to "catch up". From my experience SGD optimization is very uniform since lr is mostly fixed compared to Adam.
I'm thinking the same thing too. I used SGD on the discriminator while training the WGAN and got better results (compared to both using Adam). I would like to know the rationale behind this trick, is there any paper you can recommend to read?
It`s not an issue but a question. I would like to know why it would be better train the Discriminator with SGD than train both parts with ADAM. I've been trying to improve the results of my GAN and before test this I would like to know why is it better to understand this!