orpatashnik / StyleCLIP

Official Implementation for "StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery" (ICCV 2021 Oral)
MIT License
4k stars 560 forks source link

The choice of parameters #2

Closed weihaox closed 3 years ago

weihaox commented 3 years ago

Thank you for sharing.

I played on the colab and obtained unpleasing results at most time. It is hard to decide what vaule to use if different edits require different parameters of the CLIP loss.

Then I tried to use the pretrained CLIP model in my own work TediGAN.

clip_results

The obtained results are not sensitive to the weights.

clip_results_cw

Adding additional perceptual loss and image reconstruction loss may help stabilize the optimization process.

orpatashnik commented 3 years ago

Hi @weihaox ,

Thanks for your interest! I agree that the parameters tuning may be a bit challenging at the moment, this is only an initial version of my project and I hope that I will be able to improve it. I was not aware of TediGAN, thanks for sharing, looks interesting.

Anyway, according to the results, which text prompt did you use? From the way that the images look, I believe that the issue may be the quality of the inversion. You may want to look at this paper, where we discuss the intimate relation between and editing in StyleGAN's latent space. The encoder's (from the paper) code will hopefully be publicly available within the few next days, and I am pretty confident that it will improve your results.

weihaox commented 3 years ago

Thank you for your reply. I am looking forward to the new version.