jalola / improved-wgan-pytorch

Improved WGAN in Pytorch
MIT License
439 stars 68 forks source link

One and Mone #23

Closed NUS-Tim closed 3 years ago

NUS-Tim commented 3 years ago

Sorry to bother you, I am new so I have too many problems... why we use gen_cost.backward(mone) rather than gen_cost.backward(one)? I think in WGAN it should be gen_cost.backward(one) - refers to the code below. I'm not sure cause I also learn from other codes but I cannot understand here...... Is the code in https://github.com/NUS-Tim/Pytorch-WGAN/tree/master/models right? I think in the papers, WGAN use real loss - fake loss but WGAN-GP use fake loss - real loss for D, but in the code above, the loss is the same, does it means that there is something wrong with the code?

jalola commented 3 years ago

You can check this issue, it explains why we use backward(mone) https://github.com/jalola/improved-wgan-pytorch/issues/1

NUS-Tim commented 3 years ago

Yes, thx a lot, and the last problem, did you have implemented WGAN? I find that in the loss of G, many people use .backward(one) for WGAN instead of using .backward(mone), which is used in WGAN-GP, but when I check the papers, WGAN and WGAN-GP has a similar algorithm here......

jalola commented 3 years ago

Basically gen_cost.backward(one) and gen_cost.backward() are the same

and

gen_cost = -gen_cost
gen_cost.backward(one)

is equivalent to gen_cost.backward(mone)

It is not about backward(one) and/or backward(mone). The important idea here is: After optimizer.step(), G tries to maximize gen_cost because we call backward(mone) (https://github.com/jalola/improved-wgan-pytorch/blob/master/train.py#L130) and D tries to minimize disc_fake (kind of gen_cost) because we call backward(one) for disc_fake (https://github.com/jalola/improved-wgan-pytorch/blob/master/train.py#L192). So there is a fight between G and D during learning process.

jalola commented 3 years ago

Please feel free to reopen if you are still not clear about it :)