Ideal Encoder Dataset Type

yuval-alaluf / stylegan3-editing

Official Implementation of "Third Time's the Charm? Image and Video Editing with StyleGAN3" (AIM ECCVW 2022) https://arxiv.org/abs/2201.13433

https://yuval-alaluf.github.io/stylegan3-editing/

MIT License

660 stars 72 forks source link

Ideal Encoder Dataset Type #39

Closed rut00 closed 2 years ago

rut00 commented 2 years ago

Our objective is to edit the original face image without glasses with sample images with glasses. I have trained the StyleGAN3 conditional network using face images with two different eyeglasses. Now for training the encoder, what should be our ideal dataset type so we can get a good inferred image?

If you can share regarding this it will be really helpful.

Thank You

yuval-alaluf commented 2 years ago

What do you mean by dataset type? Basically, this repo only has one dataset type (ffhq_encode) which works on human faces. Have you tried using the pretrained models we provided?

rut00 commented 2 years ago

What do you mean by dataset type?

Dataset type means we have all images with specific eyeglass frames on faces. So for our use case eyeglass style transfer, we need faces with our eyeglass frames to train the encoder, right?

Also for our use case, I guess we need to train our conditional network, please correct me if I am wrong.

yuval-alaluf commented 2 years ago

If you are simply trying to add/remove eyeglasses from face images, have you tried using the pretrained encoder and SG3 generator and edit along the eyeglasses direction of InterFaceGAN? I feel like this could help solve your problem without needing to train anything.

rut00 commented 2 years ago

As suggested by you, I have tried editing the image with the pre-trained encoder and trained eyeglass boundaries with the conditional SG3 generator but the results I am getting are:

Along with the eyeglass frame, other facial attributes are also added. And how do I add my own eyeglass frame?

yuval-alaluf commented 2 years ago

If I were to approach this task, I would try using StyleGAN2 and StyleCLIP for editing the face image. I don't think that trying to train a conditional SG3 generator and an encoder is the way to go here. I am not really sure how you plan on approaching this task using SG3. You could try framing the problem as an image-to-image task from images without glasses to images with glasses, but this repo is not built for that. To add this, you'll need to introduce your own changes, so you're a bit on your own here.