A GAN based approach for one model to swap them all.
The table below shows our priliminary face-swapping results requiring one source face and <=5 target face photos. Notice that almost all of the identities, except Stephen Curry, are not in our training data (which is a subset of VGGFace2). More translation results can be found here.
Also, our model is capable of producing faces that has its gaze direction, glasses, and hiar occlusions being consistent with given source face. However, our model has suboptimal performance in terms of translating to asian faces. This is possibly due to limited representability of the feature extractor.
Src.\Tar. | Andrej Karpathy | Andrew Y. Ng | Du Fu | Elon Musk | Emilia Clarke | Geoffrey Hinton | Stephen Curry | Yann Lecun | Yoshua Benjio |
---|---|---|---|---|---|---|---|---|---|
Andrej Karpathy |
N/A | ||||||||
Andrew Y. Ng |
N/A | ||||||||
Du Fu | N/A | ||||||||
Elon Musk |
N/A | ||||||||
Emilia Clarke |
N/A | ||||||||
Geoffrey Hinton |
N/A | ||||||||
Stephen Curry |
N/A | ||||||||
Yann Lecun |
N/A | ||||||||
Yoshua Benjio |
N/A | ||||||||
The above image illustrates our generator, which is a encoder-decoder based network, at test phase. Our swap-them-all approach is basically a GAN conditioned on the latent embeddings extracted from a pre-trained face recognition model. SPADE and AdaIN modules are incorporated in order to inject semantic priors to the networks.
During training phase, the input face A is heavily blurred and we train the model with resonctruction loss. Other objectives that aimed to improve translation performance while keeping semantic consistency, such as perceptual loss on rgb output and cosine similarity loss on latent embeddings, are also introduced.