yuval-alaluf / restyle-encoder

Official Implementation for "ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement" (ICCV 2021) https://arxiv.org/abs/2104.02699
https://yuval-alaluf.github.io/restyle-encoder/
MIT License
1.03k stars 156 forks source link

Is current operation correct? #27

Closed GLivshits closed 3 years ago

GLivshits commented 3 years ago

Hello again. Im trying your code (except I've chosed lucidrains GAN) to invert fingerprints (toy project, my first working GAN, publically available data). GAN works nice, but when trying your code Im getting only the shape of fingerprint right, whereas the pattern is completely non-natural. Max iterations I've tried were 25k. I use single-channel image (slight modifications to your code). Example attached. What can you suggest to improve the quality? I use l2, discriminator from GAN and LPIPS.

yuval-alaluf commented 3 years ago

To get a better idea of your configuration, I have several questions:

  1. Can you please send me the full command you're running and maybe some randomly generated samples from your GAN?
  2. Are you still using StyleGAN2? Just a different implementation of it that you previously trained?
  3. Do you see an improvement between the 25,000 training steps? Or were these the results you got from the beginning?
  4. If you made other modifications to the code, can you please clarify what changes you made?
GLivshits commented 3 years ago

1) Images from GAN: Command: python scripts/train_restyle_psp.py --encoder_type 'BackboneEncoder' --input_nc 2 --output_size 256 --learning_rate 0.0002 --batch_size 6 --lpips_lambda 0.8 --id_lambda 0.8 --l2_lambda 4 --w_norm_lambda 0.001 2) I've trained a Lucidrains Stylegan2 version and using it now. Adopted its code to your restyler. It generates the same images as before reworking. 3) I see an improvement only maybe 500 iters, then loss just fluctuates on the same high level. 4) Adaptation for grayscale, added encoder from https://arxiv.org/pdf/2104.07661.pdf, added discriminator loss and discr_lambda. Some modifications to visualization.

yuval-alaluf commented 3 years ago

So a couple of things:

  1. I am not familiar with the architecture from the paper you linked. But since you're not working on the facial domain you should make sure that your encoder is not initialized with weights from the ir_se50 model. Basically, in all of our other domains we use a ResNet34 backbone pre-trained on ImageNet, which I believe is what you want in your case.
  2. I see you're using the id_loss but this is designed specifically for faces. You should instead use the MoCo-based loss. More details on setting up the proper parameters can be found here: https://github.com/yuval-alaluf/restyle-encoder#additional-notes
  3. I see you made a lot of changes to the loss parameters. Is there a specific reason for this? Did the default parameters not work well? Also, if you are working on inversion, you don't really need the w_norm_loss. You should set its lambda value to 0.
  4. Is there a specific reason you use a discriminator loss? If you have a pre-trained StyleGAN, you don't really need to worry about a discriminator loss since the images should already be quite realistic. Adding a discriminator loss could hurt the results even (in my experience).

Hope this helps. I do believe that you should be able to get pretty good results on this domain. The challenge is capturing the finer details regarding in an accurate reconstruction, but I do believe you should be able to get close.

GLivshits commented 3 years ago

So, I've found out that regularizing the W vector is very important, because under many additions it's norm just blows up. I'm using a network for fingerprint recognition instead of face. But the thing is that the pattern of generated images does not change a lot with more iterations. Network just tries to modify it's shape in order to minimize L2 loss.