shaoanlu / faceswap-GAN

A denoising autoencoder + adversarial losses and attention mechanisms for face swapping.
3.38k stars 844 forks source link

Question about preview images #28

Open vinerz opened 6 years ago

vinerz commented 6 years ago

First of all, thank you so much for such a detailed notebook!

Could you explain a little bit more about the training previews?

I am a long time developer, but my specialty is at node.js. With this new face-swapping hype, I decided to jump on the train for fun, but some stuff is still a bit confusing for me. I've read some GAN papers and I found this to be the most effective (and most fun!) path to do face swapping.

First of all: Why are the faces on the output sample blue-ish? Is this the correct behavior? Second: I'm noticing some weird hard pixels around the actors' noses, specifically around Tom Hiddleston's nose. Is this also supposed to happen? Third: The third column at the sample masks is empty. Is this correct?

Hardware I am currently running the training script on this setup:

This is running ~1 iteration/sec. Current loss information:

Loss_DA: 0.001124 Loss_DB: 0.000390 Loss_GA: 0.008428 Loss_GB: 0.010443

Sample after ~1k iterations

_sample_faces

Sample after ~2.5k iterations

_sample_faces-3

Most recent sample after ~3.2k iterations with mask preview

_sample_faces-5 _sample_masks

vinerz commented 6 years ago

Update: For some reason, the third column decided to show up, from nothing.

_sample_masks-2

shaoanlu commented 6 years ago
  1. The output of generator has channel ordering BGR instead of RGB. Probably there is a missing line cv2.cvtColor(...) before displaying function. Which jupyter notebook did you use?

For 2 and 3, they have similar cause that the generator (auto-encoder) is not trained enough. Outputs of generator are too blurry that there is little backprop. information from discriminator being able to flow through generator. In other words, the predicted mask will converge only after autoencoder can produce good result.

vinerz commented 6 years ago

Hey @shaoanlu! Thanks for the explanation 😄

Found the Smurfs issue: I changed plt.imshow to cv2.imwrite in order to save previews as JPEGs and it uses the infamous BGR channels. Removed the channel conversion and it works like a charm now!

Here's a preview: _sample_faces-7 _sample_masks-3

About the training time:

I've also trained the same images with the regular deepfake algorithm and after 14 hours of traning, I got some pretty decent results, as shown below:

sample_regular-5

What I've read about GAN training is that it takes a while to train, but once the network reaches a certain point, the generator converges and increases its accuracy exponentially. Is this correct?

shaoanlu commented 6 years ago

Yes, it is also common that there are weird artifacts on intermediate preview faces. To leverage the power of discriminator, the generator has to first learn well on how to reconstruct warped faces back to unwarped ones (reconstruction L1 loss should be small enough). Then the discriminator can send instructive signal to generator for weights update.