Open mrgloom opened 5 years ago
I think theoretically fully convolutional networks (FCN) should work as well.
However, when exploring FCN architectures, I sometimes found that the output images are almost identical to their inputs, which means the auto-encoder did not learning anything about reconstructing the warped faces. Hence, I suspect the bottleneck effect (which can be introduced by either dense layers or conv layers) is somehow (but not always) necessary for denoising auto-encoder (AE) based faceswap approaches. The difference between warped-input and its ground truth (GT) is relatively subtle so we have to add regularization.
If we compare denoising AE based approaches to general image-to-image translation approaches such as the UNIT family and the pix2pix family, the latter ones usually apply to tasks that have input output belong to different domains, e.g., horses to zebras, dogs to cats, and segmentation maps to realistic photos. In such case, we might not need bottleneck layers to regularize our models.
*Disclaimer: the above opinions are not based on any rigorous experiment, so take it with a grain of salt
Why dense layers are needed? Isn't it possible to use fully convolutional encoder? https://camo.githubusercontent.com/be99bf7fb91abf85202d50fd89cd71ed4ed5ec61/68747470733a2f2f7777772e64726f70626f782e636f6d2f732f62343378386276357878626f3571302f656e635f6172636833645f726573697a6564322e6a70673f7261773d31