yuval-alaluf / restyle-encoder

Official Implementation for "ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement" (ICCV 2021) https://arxiv.org/abs/2104.02699
https://yuval-alaluf.github.io/restyle-encoder/
MIT License
1.03k stars 156 forks source link

StyleGAN 3 support? #45

Closed Kitty-sunray closed 3 years ago

Kitty-sunray commented 3 years ago

Hello! Any chance of NVLabs official Stylegan3 support? Specifically, it generates new config-T and config-R models, which would be awesome to use to train ReStyle on! Also, there is an BSRGAN approach that seems to make everything better: it downgrades quality of input images in different realistic ways, which already helped to improve few GAN's. What I assume, if ReStyle would be trained to encode not only perfect images, but also their variations of degraded quality, then trained ReStyle will be less sensitive to image quality itself, but more about image actual contents, or am I totally wrong?

yuval-alaluf commented 3 years ago

Hi @Kitty-sunray,

The BSRGAN approach you mentioned does seem interesting to try out as a sort of augmentation to the training process. It could be a nice experiment to try out.

Regarding SG3, I have begun playing around with encoding real images into SG3's latent space. However, it is not trivial to get a good encoder for SG3:

  1. For one, the generator requires a lot more memory consumption (from what I can tell), leading to a much smaller batch size during training (With SG2 we trained with a batch size of 8 while with SG3 I am able to reach a batch size of 2). This results in much slower convergence.
  2. Since the architecture is different, the gradients that are back-propagated to the encoder behave differently than in SG2. Therefore, some more analysis is needed in how to best train the encoder. This too leads to slower convergence.
  3. There is still not enough analysis to understand which layers control which image attributes. For example, in SG2 we had the distinction between coarse, medium, and fine layers. In SG3 this no longer seems to be the case. Therefore, things such as style mixing may not apply to SG3.
  4. Overall, the image quality of SG3 does not seem to be superior to that of SG2 although you do get some nicer animations.

Therefore, I am not planning on adding explicit support for SG3 in this repository until I am able to do proper experiments and get visual nice results.

If you are interested, implementing SG3 with ReStyle takes less than 10 minutes and I will be happy to assist with any questions you have.

Kitty-sunray commented 3 years ago

10/10 response ;) Only thing I can add re 3. Sure style-mixing and other manipulations in latent space of encoded image seems to be the leading reason to encode, and without this support of SG3 seems useless, but I want add that there other reasons to encode images, for example encoding images not-belonging to the domain GAN was originally trained on (yet not the bootstrapping) or image defects repairing/inpainting. re 4. Ok I got it, it is too early and seems unnecessary to support v3.0 yet. Maybe soon v3,0 will be standard de-facto, considering cool tools they provide and interesting papers that will appear around v3, then maybe it would be appropriate time to think on it again. Re BSRGAN, I see it's main value - is the ability to imitate smartphone camera sensor noise (e.g. from low light) which is usually can be found on the real-world user-generated images. Blurring, re-upscaling and jpeg'ing are trivial although still helpful too. I've tried it and it actually improved quality of models I generate. Re implementing SG3 with ReStyle in 10 minutes, thank you so much for assist proposition, but I am too young in neural networks to be able to understand things at this level. Although, I saw today an update (and paper published) on how "changing nearest-neighbor to lanczos downsampling of dataset images improved quality so much! " and was fascinated to realize how huge holes in knowledge on even basic image processing some people working on GAN's may have! Thus I got motivated thinking I may carry a chance to get into the scene quicker than I thought. Do you have your youtube ML teaching courses maybe? Also, there is datasets much, much bigger than FFHQ (it is just 70k images) available (like from 1m to 5m face images), if by a chance you have resources idle'ing, than I kindly propose to load it with re-training stylegan and ReStyle to obtain way better results than current restyle-over-ffhq capable of.

yuval-alaluf commented 3 years ago

Yea I agree with you that I believe SG3 will become to de-facto style-based generator eventually :) It is definitely interesting to see how SG3 behaves on out-of-domain images and on other tasks such as inpainting. Hopefully once I find some more time I'll look into things like this.

Unfortunately, I don't have any youtube courses that I can recommend to you. My best recommendation is to play around with existing open sources and see how these models work and then try to implement some features on your own :)

Regarding training on a large dataset, typically training with more data does improve the generalization results. However, I am not sure how much of an improvement you'll see if you re-train ReStyle with more data. It could be worth a try though.