Visual Attribute Transfer through Deep Image Analogy

https://arxiv.org/pdf/1705.01088.pdf

We propose a new technique for visual attribute transfer across images that may have very different appearance but have perceptually similar semantic structure. By visual attribute transfer, we mean transfer of visual information (such as color, tone, texture, and style) from one image to another. For example, one image could be that of a painting or a sketch while the other is a photo of a real scene, and both depict the same type of scene. Our technique finds semantically-meaningful dense correspondences between two input images. To accomplish this, it adapts the notion of "image analogy" with features extracted from a Deep Convolutional Neutral Network for matching; we call our technique Deep Image Analogy. A coarse-to-fine strategy is used to compute the nearest-neighbor field for generating the results. We validate the effectiveness of our proposed method in a variety of cases, including style/texture transfer, color/style swap, sketch/painting to photo, and time lapse.

Summary:

Objective: Transfer visual attribute (color, tone, texture, and style, etc) between two semantically-meaningful images such as a picture and a sketch.

Inner workings:

Image analogy

An image analogy A:A′::B:B′ is a relation where:

B′ relates to B in the same way as A′ relates to A
A and A′ are in pixel-wise correspondences
B and B′ are in pixel-wise correspondences

In this paper only a source image A and an example image B′ are given, and both A′ and B represent latent images to be estimated.

Dense correspondence

In order to find dense correspondences between two images they use features from previously trained CNN (VGG-19) and retrieve all the ReLU layers.

The mapping is divided in two sub-mappings that are easier to compute, first a visual attribute transformation and then a space transformation.

Architecture:

The algorithm proceeds as follow:

Compute features at each layer for the input image using a pre-trained CNN and initialize feature maps of latent images with coarsest layer.
For said layer compute a forward and reverse nearest-neighbor field (NNF, basically an offset field).
Use this NNF with the feature of the input current layer to compute the features of the latent images.
Upsample the NNF and use it as the initialization for the NNF of the next layer.

Results:

Impressive quality on all type of visual transfer but veryyyyy slow! (~3min on GPUs for one image).

leo-p / papers