omertov / encoder4editing

Official implementation of "Designing an Encoder for StyleGAN Image Manipulation" (SIGGRAPH 2021) https://arxiv.org/abs/2102.02766
MIT License
945 stars 154 forks source link

about ffhq_encode performance #49

Closed LLSean closed 2 years ago

LLSean commented 3 years ago

I trained a model for ffhq_encode but the performance is bad on some scenes. The background is difficult to learn.So what should i do to improve the performance?My training data is 5000 pictures.Should I add some training data?And my loss is id_loss, should I use moco loss?

}3DD8X%7HRWX)H4SR42DSBG 74FMT9D 19JBVA@TS)1I3JN 2%)2U7CY7B 5T(3OYAR%C8O

omertov commented 3 years ago

Hi @LLSean! Regarding the MOCO loss vs the ID loss, we have not tried running a MOCO based training for the faces domain, but the Identity loss should perform better as it is specific for the faces domain and therefore you should probably stick to it.

As for the results - the encoder aims to obtain meaningfull embeddings in the pretrained FFHQ StyleGAN2 latent space, which will achieve good editing (and perceptual) results. Since the "well behaved" regions are probably not expressive enough to yield an exact reconstruction of your examples, you could:

  1. explore different hyperparameters (see "distortion-perception" and "distortion-editablity" tradeoffs in the paper)
  2. fine tune the StyleGAN2 model [together with / before] training the encoder ( something we did not explore as the paper is about investigating the latent space of a pretrained StyleGAN2 and encoding into selected regions)
  3. use PTI to edit a specific image.

Hope it helps! Best, Omer

LLSean commented 3 years ago

@omertov Thanks for your reply. And the information is very helpful. I will try the method 1.The StyleGan2 model is fine tuned by me and the training data is generated by this model. I also tried psp model for inversion, the performance is more well than e4e, but it's hard to edit. So I will continue to try more hyperparameters.

omertov commented 3 years ago

@LLSean I wonder if the training images you used can be produced from the well behaved regions in your new GAN, two simple things you can try are:

  1. Optimize a code for a training image and try editing it (for example use the SG2 optimization process).
  2. Encode an image using the e4e you trained to obtain initial latent code, and optimize it with lower learning rate and fewer steps (with the target image).

As for the hyperparameters, I suggest playing with the delta regularization loss (--delta_norm_lambda), the progressive training (lower --progressive_start and --progressive_step_every values), and lastly with --w_discriminator_lambda.

Best, Omer

LLSean commented 3 years ago

@omertov Thanks very much. I tried some hyperparameters such as l2_lambda and lpips_lambda. The l2_lambda parameter controls the background similarity. The lpips_lambda parameter controls the accessories and hairstyles on people. I will continue to try your advice and read paper about Progressive training.