leo-p / papers

Papers and their summary (in issue)
22 stars 4 forks source link

BEGAN- Boundary Equilibrium Generative Adversarial Networks #19

Open leo-p opened 7 years ago

leo-p commented 7 years ago

https://arxiv.org/pdf/1703.10717.pdf

We propose a new equilibrium enforcing method paired with a loss derived from the Wasserstein distance for training auto-encoder based Generative Adversarial Networks. This method balances the generator and discriminator during training. Additionally, it provides a new approximate convergence measure, fast and stable training and high visual quality. We also derive a way of controlling the trade-off between image diversity and visual quality. We focus on the image generation task, setting a new milestone in visual quality, even at higher resolutions. This is achieved while using a relatively simple model architecture and a standard training procedure.

leo-p commented 7 years ago

Summary:

Inner workings:

They try to match the distribution of the errors (assumed to be normally distributed) instead of matching the distribution of the samples directly. In order to do this they compute the Wasserstein distance between a pixel-wise autoencoder loss distributions of real and generated samples defined as follow:

  1. Autoencoder loss:

    screen shot 2017-04-24 at 3 46 32 pm
  2. Wasserstein distance for two normal distributions μ1 = N(m1, C1) and μ2 = N(m2, C2)

    screen shot 2017-04-24 at 3 46 44 pm

They also introduce an equilibrium concept to account for the situation when G and D are not well balanced and the discriminator D wins easily. This is controlled by what they call the diversity ratio that balances between auto-encoding real images and discriminating real from generated images. It is defined as follow:

screen shot 2017-04-24 at 3 56 29 pm

To maintain this balance they use a standard SGD but they introduce a variable kt initially 0 to control how much emphasis is put on the generator G. This removes the need to do x steps on D followed by y steps on G or to pretrained one of the two.

screen shot 2017-04-24 at 3 59 57 pm

Finally they derive a global convergence measure by using the equilibrium concept that can be used to determine when the network has reached its final state or if the model has collapsed:

screen shot 2017-04-24 at 4 04 12 pm

Architecture:

They tried to keep the architecture simple to really study the impact of their new equilibrium principle and loss. They don't use batch normalization, dropout, transpose convolutions or exponential growth for convolution filters.

screen shot 2017-04-24 at 4 09 29 pm

Results:

They trained on images from 32x32 to 256x256, but at higher resolution images tend to lose sharpness. Nevertheless images are very very good!

screen shot 2017-04-24 at 4 20 30 pm