omertov / encoder4editing

Official implementation of "Designing an Encoder for StyleGAN Image Manipulation" (SIGGRAPH 2021) https://arxiv.org/abs/2102.02766
MIT License
945 stars 154 forks source link

Pertained models with 512 or 256 resolutions? #10

Closed sailor002 closed 3 years ago

sailor002 commented 3 years ago

Hi, thanks for the excellent work. I see that you have a pretrained model for FFHQ 1024x1024 resolution. Do you have one for 512 or 256?

yuval-alaluf commented 3 years ago

Hi @sailor002 , we do not have FFHQ models for 512 or 256 resolutions. However, may I ask why you would like to use a smaller resolution generator? Our encoders are trained on inputs of size 256x256, but we generate images at 1024x1024 resolution. Therefore, you can use the smaller-sized inputs but get the benefits of a higher resolution generator with little cost.

sailor002 commented 3 years ago

Thanks for the quick reply @yuval-alaluf. Actually, we are assessing the eligibility of "e4e" to be used in a production project where we plan to work with 512 resolution images. So anything above the necessary resolution will overuse resources and prolong execution times. By the way, my impression is that e4e outperforms idinvert (which we are currently experimenting with) in every aspect, even when a moderate post-optimization is applied to idinvert. So, I really appreciate your work.

So maybe we can retrain your model in lower resolutions. Do you think that we can benefit from your pretrained model by copying its available weights to the lower resolution model so that we don't start from scratch? Thanks...

omertov commented 3 years ago

Hi @sailor002!

The model consists of 2 networks - a pretrained StyleGAN generator (checkpoint taken from the official NVIDIA repository) and the e4e encoder.

If you have a lower resolution StyleGAN2 generator, I would advise to retrain the e4e encoder from scratch since the latent space of the 1024x1024 generator and your generator are not the same.

sailor002 commented 3 years ago

By the way, the main reason I'm looking for a lower resolution is that the latent code produced by your model has a dimension of 18x512, which can be used by the official Stylegan2 ffhq-1024 pretrained model. However, what we need is a latent code that is compatible with the official ffhq-512 model, which should have 16x512 latent dimension. As far as I know, there is no way to use a latent with the different resolution Stylegan2 models.

sailor002 commented 3 years ago

Thanks, @omertov. We'll probably go that way then.

Harsha-Musunuri commented 3 years ago

@sailor002 did you get a chance to train the 256x256 model ?

li-car-fei commented 2 years ago

@sailor002 did you get a chance to train the 256x256 model ?