Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks

https://arxiv.org/pdf/1703.10593.pdf

Image-to-image translation is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image using a training set of aligned image pairs. However, for many tasks, paired training data will not be available. We present an approach for learning to translate an image from a source domain X to a target domain Y in the absence of paired examples. Our goal is to learn a mapping G : X → Y such that the distribution of images from G(X) is indistinguishable from the distribution Y using an adversarial loss. Because this mapping is highly under-constrained, we couple it with an inverse mapping F : Y → X and introduce a cycle consistency loss to push F(G(X)) ≈ X (and vice versa). Qualitative results are presented on several tasks where paired training data does not exist, including collection style transfer, object transfiguration, season transfer, and photo enhancement, etc. Quantitative comparisons against several prior methods demonstrate the superiority of our approach.

Summary:

Dataset: Cityscapes, CMP Facade, UT Zappos50k and ImageNet.
Objective: Image-to-image translation to perform visual attribute transfer using unpaired images.
Code: CycleGAN

Inner-workings:

Basically two GANs for each domain with their respective Generator and Discriminator plus two additional losses (called consistency losses) to make sure that translating to the other domain then back yields an image that is still realistic.

For the consistency los they use a pixel-wise L1 norm:

Architecture:

Based on Perceptual losses for real-time style transfer and super-resolution, code available here. Training seems to employ several tricks and then even use a batch of 1.

Results:

Very impressive and the really key point is that you don't need paired images which makes this trainable on any domain with the same representation behind.

leo-p / papers