leo-p / papers

Papers and their summary (in issue)
22 stars 4 forks source link

Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space #28

Open leo-p opened 7 years ago

leo-p commented 7 years ago

https://arxiv.org/pdf/1612.00005v2.pdf

Generating high-resolution, photo-realistic images has been a long-standing goal in machine learning. Recently, Nguyen et al. (2016) showed one interesting way to synthesize novel images by performing gradient ascent in the latent space of a generator network to maximize the activations of one or multiple neurons in a separate classifier network. In this paper we extend this method by introducing an additional prior on the latent code, improving both sample quality and sample diversity, leading to a state-of-the-art generative model that produces high quality images at higher resolutions (227x227) than previous generative models, and does so for all 1000 ImageNet categories. In addition, we provide a unified probabilistic interpretation of related activation maximization methods and call the general class of models "Plug and Play Generative Networks". PPGNs are composed of 1) a generator network G that is capable of drawing a wide range of image types and 2) a replaceable "condition" network C that tells the generator what to draw. We demonstrate the generation of images conditioned on a class (when C is an ImageNet or MIT Places classification network) and also conditioned on a caption (when C is an image captioning network). Our method also improves the state of the art of Multifaceted Feature Visualization, which generates the set of synthetic inputs that activate a neuron in order to better understand how deep neural networks operate. Finally, we show that our model performs reasonably well at the task of image inpainting. While image models are used in this paper, the approach is modality-agnostic and can be applied to many types of data.

leo-p commented 7 years ago

Summary:

Inner-workings:

The idea is to find an image that maximizes the probability for a given label by using a variant of a Markov Chain Monte Carlo (MCMC) sampler.

screen shot 2017-06-01 at 12 31 14 pm

Where the first term ensures that we stay in the image manifold that we're trying to find and don't just produce adversarial examples and the second term makes sure that find an image corresponding to the label we're looking for.

Basically we start with a random image and iteratively find a better image to match the label we're trying to generate.

MALA-approx:

MALA-approx is the MCMC sampler based on the Metropolis-Adjusted Langevin Algorithm that they use in the paper, it is defined iteratively as follow:

screen shot 2017-06-01 at 12 25 45 pm

where:

Image prior:

They try several priors for the images:

  1. PPGN-x: p(x) is modeled with a Denoising Auto-Encoder (DAE).

    screen shot 2017-06-01 at 1 48 33 pm
  2. DGN-AM: use a latent space to model x with h using a GAN.

    screen shot 2017-06-01 at 1 49 41 pm
  3. PPGN-h: incorporates a prior for p(h) using a DAE.

    screen shot 2017-06-01 at 1 51 14 pm
  4. Joint PPGN-h: to increases expressivity of G, model h by first modeling x in the DAE.

    screen shot 2017-06-01 at 1 51 23 pm
  5. Noiseless joint PPGN-h: same as previous but without noise.

    screen shot 2017-06-01 at 1 54 11 pm

Conditioning:

In the paper they mostly use conditioning on label but captions or pretty much anything can also be used.

screen shot 2017-06-01 at 2 26 53 pm

Architecture:

The final architecture using a pretrained classifier network is below. Note that only G and D are trained.

screen shot 2017-06-01 at 2 29 49 pm

Results:

Pretty much any base network can be used with minimal training of G and D. It produces very realistic image with a great diversity, see below for examples of 227x227 images with ImageNet.

screen shot 2017-06-01 at 2 32 38 pm