junyanz / pytorch-CycleGAN-and-pix2pix

Image-to-Image Translation in PyTorch
Other
22.95k stars 6.31k forks source link

Random noise z augmentation with cGAN #152

Closed aabobakr closed 6 years ago

aabobakr commented 6 years ago

Thanks for sharing this great work.

Conditional GANs (cGANs) learn a mapping from observed image x and random noise vector z to y: y = f(x, z). I am wondering how z is augmented on the input x for the generator. In the code, x is passed to the generator's forward method as: self.fake_B = self.netG(self.real_A) and in the forward method there is no z

junyanz commented 6 years ago

The current model does not take z as input. In both pix2pix and CycleGAN, we tried to add z to the generator but often found that z got ignored. So we decided to only take real_A as input.

zzw1123 commented 6 years ago

@junyanz Could you please explain how did you find z get ignored in detail? What did the result show? Since the noise may be ignored by G, why does original Conditional Gan perform well? Thank you very much!

phillipi commented 6 years ago

We tried a few ways of adding z to the nets, e.g., adding z to a latent state, concatenating with a latent state, applying dropout, etc. The output tended not to vary much as a function of z. You can see the effect of random dropout here: https://affinelayer.com/pixsrv/

Click the "pix2pix" button multiple times to see different random samples. In this implementation, the only noise is dropout (as in the pix2pix paper). Some minor details vary from click to click but overall not much changes.

Conditional GANs don't really need noise as long as the input you are conditioning on is sufficiently complex, so that it can kind of play the role of noise. Without noise, the mapping is deterministic, but that's often fine.

Here's a follow up paper that shows one way of getting z to actually have a substantial effect: https://junyanz.github.io/BicycleGAN/

zzw1123 commented 6 years ago

@phillipi Thanks, phillipi. I have tried on the link you sent, and it seems it does have no big differences... What's more, I have read Bicyclegan about which there are two questions confusing: 1、Does Bicyclegan have some reality application?Or just a image transfer project full of fantasy and imagenation? 2、Are all images used to train paired? e.g. Domain A contains an image of a building, so it's paired image in domain B must be the same building shot in the same angle? So I think it is time-consuming to choose training set,right?

phillipi commented 6 years ago
  1. I think there are a bunch of applications, like an artist could sketch a shoe and then the bicyclegan would present several possible colorizations and the artist could choose the one they like the most (in pix2pix, you only get a single choice). But I also think it's definitely a project full of fantasy and imagination :)
  2. Yeah it's all paired training data. The name is a bit confusing since it actually is applied to the pix2pix setting, not the cyclegan setting.
zzw1123 commented 6 years ago

Thank you so much for the kind reply!

ahmed-fau commented 6 years ago

@phillipi excuse me I have one question regarding this issue:

Based on your statement: "Without noise, the mapping is deterministic, but that's often fine.", what I understood is that the prior Z is only useful if we need to to have some sort of variety in the generated samples. However, if we need to just learn direct mapping between two paired domains (e.g. image and its semantic label map), then it is sufficient to ignore Z ... is this correct ?

If this intuition is true, why isn't it sufficient for the CycleGAN and you enforced the cycle consistency for the generated signals ?

Many thanks in advance

phillipi commented 6 years ago

@ahmed-fau Yeah you only need z if you want the translation function to output multiple possibilities for each input.

I'm not sure I understand the question about CycleGAN. In CycleGAN, we don't use z.

ahmed-fau commented 6 years ago

@phillipi I mean that the idea of CycleGAN is more or less similar to image-to-image translation except that in CycleGAN the mapping is bidirectional ... so if we are interested in only unidirectional mapping then both are similar (according to my understanding). Is there any difference between them in terms of latent space mapping ?

xyp8023 commented 5 years ago

Without noise, the mapping is deterministic, but that's often fine

Hi, thanks for the sharing but about the deterministic mapping, I have a doubt: I have tried to implement pix2pix on another dataset (where the input image is complex enough), so I do not apply dropout at all, and the result looks just as fine. But the problem is, without noise as input, can we even call it a generative model? It looks like a discriminative model (U-Net based Autoencoder) + the discriminator loss.

And if pix2pix without dropout outperforms a U-Net based Autoencoder, can we think in this way: pix2pix without dropout is better because of the powerful discriminator loss. Is my understanding correct?

Thanks a lot!

phillipi commented 5 years ago

Yeah the dropout doesn't really matter much for performance. It has a very minor effect. One could argue about whether or not deterministic mappings count as "generative models". It's true that it does not model a distribution of outputs, instead it just gives a single guess.

And if pix2pix without dropout outperforms a U-Net based Autoencoder, can we think in this way: pix2pix without dropout is better because of the powerful discriminator loss. Is my understanding correct?

Yep, I think that's a good way to think about it.