leo-p / papers

Papers and their summary (in issue)
22 stars 4 forks source link

Visual Attribute Transfer through Deep Image Analogy #24

Open leo-p opened 7 years ago

leo-p commented 7 years ago

https://arxiv.org/pdf/1705.01088.pdf

We propose a new technique for visual attribute transfer across images that may have very different appearance but have perceptually similar semantic structure. By visual attribute transfer, we mean transfer of visual information (such as color, tone, texture, and style) from one image to another. For example, one image could be that of a painting or a sketch while the other is a photo of a real scene, and both depict the same type of scene. Our technique finds semantically-meaningful dense correspondences between two input images. To accomplish this, it adapts the notion of "image analogy" with features extracted from a Deep Convolutional Neutral Network for matching; we call our technique Deep Image Analogy. A coarse-to-fine strategy is used to compute the nearest-neighbor field for generating the results. We validate the effectiveness of our proposed method in a variety of cases, including style/texture transfer, color/style swap, sketch/painting to photo, and time lapse.

leo-p commented 7 years ago

Summary:

Inner workings:

Image analogy

An image analogy A:A′::B:B′ is a relation where:

In this paper only a source image A and an example image B′ are given, and both A′ and B represent latent images to be estimated.

screen shot 2017-05-18 at 10 43 48 am

Dense correspondence

In order to find dense correspondences between two images they use features from previously trained CNN (VGG-19) and retrieve all the ReLU layers.

The mapping is divided in two sub-mappings that are easier to compute, first a visual attribute transformation and then a space transformation.

screen shot 2017-05-18 at 11 04 58 am

Architecture:

The algorithm proceeds as follow:

  1. Compute features at each layer for the input image using a pre-trained CNN and initialize feature maps of latent images with coarsest layer.
  2. For said layer compute a forward and reverse nearest-neighbor field (NNF, basically an offset field).
  3. Use this NNF with the feature of the input current layer to compute the features of the latent images.
  4. Upsample the NNF and use it as the initialization for the NNF of the next layer.
screen shot 2017-05-18 at 11 14 33 am

Results:

Impressive quality on all type of visual transfer but veryyyyy slow! (~3min on GPUs for one image).

screen shot 2017-05-18 at 11 36 47 am