Question on conditional image generation

tackgeun commented 6 years ago

With the help of your advice, I reproduced the main part of the paper and succeed in training your model to ImageNet with specific classes, minibus, dog.

Now, I try to reproduce 6.8 conditional image generation in the paper to use this architecture for unsupervised/weakly-supervised object segmentation. But I had a hard time to find a trainable setup. Can you share a setting in the experiment?

[Encoder architecture]

Use same architecture with a discriminator.
If an image is given, encoder returns a vector with the dimension same as that of random noise vector (nz).
To generate background vector, I use Encoder(real image)=background vector
To generate foreground vector, I use Encoder(real image - generated image)=foreground vector

[Optimization method]

Original code (1) maximize log(D(x)) + log(1 - D(G(z))) with fake image then (2) maximize log(D(G(z))) with true image.
I use three-options in optimization (1) update auto-encoder + generator part additionally (2) update auto-encoder part with maximizing log(D(G(z)) step. (3) update only auto-encoder part additonally. But all variants are not working.

Thanks.

jwyang commented 6 years ago

Hi, @tackgeun ,

glad to know you have successfully get the results as the paper!

Regarding the conditional image generation,

[Encoder architecture]

Use same architecture with a discriminator.

yes, I also used the same architecture as the discriminator, except that the output layer is a fc layer.

If an image is given, encoder returns a vector with the dimension same as that of random noise vector (nz).

yes, exactly, but I put a batch normalization layer before the last fc layer.

To generate background vector, I use Encoder(real image)=background vector

yeah, that's what I did.

To generate foreground vector, I use Encoder(real image - generated image)=foreground vector

yes, that's what I did.

[Optimization method]

Original code (1) maximize log(D(x)) + log(1 - D(G(z))) with fake image then (2) maximize log(D(G(z))) with true image.
I use three-options in optimization (1) update auto-encoder + generator part additionally (2) update auto-encoder part with maximizing log(D(G(z)) step. (3) update only auto-encoder part additonally. But all variants are not working.

In my experiments, I used reconstruction loss (with weight 1e-3) plus the adversarial loss for training the generator, and the same loss as original GAN for training the discriminator.

Also, please remember to regularize the transformation parameters. The rotation should be minor or even void during training.

Hope these are helpful.

tackgeun commented 6 years ago

With your help, my code starts to generate an image. But reconstructed images are blurry compared to the paper. But I'm confusing that the encoder architecture and the way of updating generator weight.

Architecture

Do you mean that the last part of encoder network like this?

Discriminator CNN part - last conv - batch norm(dimension nz) - relu - fc

Optimization

The encoder is embedded within generator class. So I compute gradient for

adversarial loss for generator(random noise)
weighted reconstruction loss(encoded image).

Then, those gradients are accumulated to compute the gradient of the sum of all losses. Following the above description, the generator's weight is updated w.r.t (1) + (2) and the encoder's weight is updated w.r.t only (2).

Thank you.

jwyang commented 6 years ago

Hi, @tackgeun

With your help, my code starts to generate an image. But reconstructed images are blurry compared to the paper. But I'm confusing that the encoder architecture and the way of updating generator weight.

Architecture

Do you mean that the last part of encoder network like this?

Discriminator CNN part - last conv - batch norm(dimension nz) - relu - fc

I used Tanh() as the last layer for the encoder, to make the output in a reasonable range.

Optimization

The encoder is embedded within generator class. So I compute gradient for

adversarial loss for generator(random noise)
weighted reconstruction loss(encoded image).

Then, those gradients are accumulated to compute the gradient of the sum of all losses. Following the above description, the generator's weight is updated w.r.t (1) + (2) and the encoder's weight is updated w.r.t only (2).

Actually, I update both the generator and the encoder with (1) and (2). And the weight for reconstruction loss is much lower, 1e-3 to 1e-2.

Thank you.

tackgeun commented 6 years ago

@Hi, I just have returned from the ICCV... and my implementation still has problems.

Did you use Euclidean loss for reconstruction loss? I heard that some works (pix2pix, CycleGAN) insist that L1 loss generates more fine details...
Did you not use _sizeaverage option for reconstruction loss?
Did you additional generator for the auto-encoder part? or use the same generator with the encoder by weight sharing?
You said that you updated both the generator and the encoder with (1) and (2). How to update the encoder parameter using random noise? part (1). Did you use reconstruction loss on random noise?

Thank you for your advice.

jwyang / lr-gan.pytorch