loss implementation differs from paper

Hi,

Thanks for this amazing implementation! I have a question concerning the loss implementation, as it seems to differ from the original equations. The screenshot below shows the GAN loss as presented in the paper :

paper_losses

in red, the discriminator loss (D loss) on the true labels,
in green the D loss on labels for fake generated images,
and in blue, the generator loss (G loss) on labels for fake images.

This makes sense to me. Since it is assumed that D outputs values between 0 and 1 (0 = fake, 1 = real) :

in red, we want D to output 1 for true images → let's assume D indeed outputs 1 for true images : -min(0, -1 + D(x)) = 0, which is indeed the minimum achievable,
in green, we want D to output 0 (from the discriminator perspective) for fake images → let's assume D indeed outputs 0 for fake images : -min(0, -1 - D(x^)) = 1, which is the minimum achievable if D outputs values only between 0 and 1,
in blue, we want D to output 1 (from the generator perspective) for fake images : the equation follows directly.

Now, the way the authors implement this in the code provided in the supplementary materials of the paper is as follows (the colors match the ones in the above picture)

og_code_loss_d_real og_code_loss_d_fake og_code_loss_g

Except for the strange involved randomness (already explained in https://github.com/lucidrains/lightweight-gan/issues/11), their implementation is a one to one match with the paper equations.

The way it is implemented in this repo however is quite different, and I do not understand why..

lighweight_gan_losses

Let's start with the discriminator loss :

in red, you want D to output small values (negative if allowed), to set this term as small as possible (0 if D can output negative values)
in green, you want D to output values as large as possible (larger or equal to 1) to cancel this term out as well

For the generator loss :

in blue, you want the opposite of green, that is for D to output values as small as possible

This implementation seems to be meaningful, and yields coherent results (as proven in examples). It also seems to me that D is not limited to output values between 0 and 1, but any real value (I might be wrong). I am just wondering why this choice? Could you perhaps elaborate why you decided to implement the loss differently from the original paper?

lucidrains / lightweight-gan

loss implementation differs from paper #128