shaoanlu / faceswap-GAN

A denoising autoencoder + adversarial losses and attention mechanisms for face swapping.
3.39k stars 841 forks source link

Why dense layers are needed? #141

Open mrgloom opened 5 years ago

mrgloom commented 5 years ago

Why dense layers are needed? Isn't it possible to use fully convolutional encoder? https://camo.githubusercontent.com/be99bf7fb91abf85202d50fd89cd71ed4ed5ec61/68747470733a2f2f7777772e64726f70626f782e636f6d2f732f62343378386276357878626f3571302f656e635f6172636833645f726573697a6564322e6a70673f7261773d31

shaoanlu commented 5 years ago

I think theoretically fully convolutional networks (FCN) should work as well.

However, when exploring FCN architectures, I sometimes found that the output images are almost identical to their inputs, which means the auto-encoder did not learning anything about reconstructing the warped faces. Hence, I suspect the bottleneck effect (which can be introduced by either dense layers or conv layers) is somehow (but not always) necessary for denoising auto-encoder (AE) based faceswap approaches. The difference between warped-input and its ground truth (GT) is relatively subtle so we have to add regularization.

If we compare denoising AE based approaches to general image-to-image translation approaches such as the UNIT family and the pix2pix family, the latter ones usually apply to tasks that have input output belong to different domains, e.g., horses to zebras, dogs to cats, and segmentation maps to realistic photos. In such case, we might not need bottleneck layers to regularize our models.

*Disclaimer: the above opinions are not based on any rigorous experiment, so take it with a grain of salt