Summary

Link

Autoencoding beyond pixels using a learned similarity metric Official implementation

They use DeepPy which seems the author's original deep learning framework..

Author/Institution

Anders Boesen Lindbo Larsen, Søren Kaae Sønderby, Hugo Larochelle, Ole Winther Technical University of Denmark, University of Copenhagen, Twitter

What is this

Combine VAEs and GANs.

Propose to use learned feature representations in the GAN discriminator as basis for the VAE reconstruction objective. Thereby, replace element-wise errors with feature-wise errors.

Moreover, show that the network is able to disentangle factors of variation in the input data distribution and discover visual attributes in the high-level representation of the latent space.

Comparison with previous researches. What are the novelties/good points?

Why not just using VAE?
- Pixel-wise metric (e.g. MSE) is not appropriate for images. Discriminator brings another approach into here
Why only GAN doesn't make sense?
- GAN doesn't have encoding ability. GAN itself will not be enough for some purpose

Key points

Collapse the VAE decoder and the GAN generator into one
- Share parameters
- Train jointly
Replace element-wise reconstruction metric with a feature-wise etric expressed in the discriminator The loss is consists of these three different losses

$L_{prior}$: KL from VAE
$L^{Disl}{llike}$: reconstruction error expressed in the GAN discriminator
$L_GAN$: standard GAN loss function = $log(Dis(x)) + log(1-Dis(Gen(z)))$

Algorithm

VAEGAN_algorithm

How the author proved effectiveness of the proposal?

Conducted experiments with CelebA dataset and showed that the generative models trained with learned similarity measures produced better image samples than models trained with element-wise error measures.

Any discussions?

How is performance in terms of computational cost? How to determine when to finish GANs training? (maybe need to check the code)

What should I read next?

Note

How is performance. It may faster or computationaly less expensive compare to MSE?

nagataka / Read-a-Paper

Autoencoding beyond pixels using a learned similarity metric #31