yuval-alaluf / hyperstyle

Official Implementation for "HyperStyle: StyleGAN Inversion with HyperNetworks for Real Image Editing" (CVPR 2022) https://arxiv.org/abs/2111.15666
https://yuval-alaluf.github.io/hyperstyle/
MIT License
998 stars 115 forks source link

Style Clip editing on hyperstyle + PTI output using global directions. #41

Closed jaingaurav1601 closed 2 years ago

jaingaurav1601 commented 2 years ago

Hey, So Hyperstyle saves weights when executed and afterward when I tune the inversion using PTI and then use styleGAN for output. The main issue I am facing is that styleGAN loads the saved weights from Hyperstyle and thus the editing is being done on Hyperstyle inversion and not Hyperstyle + PTI tuned inversion. So is there a way to use global directions only and save the weights after the tuning through PTI has been performed?

yuval-alaluf commented 2 years ago

The official implementation of PTI (https://github.com/danielroich/PTI) uses a different implementation of StyleGAN2 than used here. I therefore recommend taking the StyleCLIP code here (https://github.com/yuval-alaluf/hyperstyle/tree/main/editing/styleclip) and copying it over to the PTI repo. You can then run PTI, save the generator, and then use the tuned generator to edit with StyleCLIP. Specifically, rather than using the --weight_deltas_path flag, you can simply set the --stylegan_weights flag to the tuned generator. I hope this makes sense :)

jaingaurav1601 commented 2 years ago

We were hoping that instead of e4e, Hyperstyle would be better to create inversion. If possible can you guide me in ways to chain Hyperstyle->PTI->StyleClip(Global directions) .

yuval-alaluf commented 2 years ago

HyperStyle and PTI are similar in the sense that they both alter the StyleGAN weights and can both use the same inversion. If you're interested in performing PTI after HyperStyle, what I would do is:

  1. Run inference using HyperStyle.
  2. Save the tuned model to a .pt file.
  3. Load the .pt file to the PTI code and start tuning from this checkpoint.
  4. After tuning is complete, save the PTI-tuned generator to a new .pt file.
  5. Load the .pt file to the StyleCLIP code as your checkpoint and edit the image using the initial inversion used for both HyperStyle and PTI.
jaingaurav1601 commented 2 years ago

I think we have already implemented exactly like you suggested but as you can see in the code below edited_image, _, _ = stylegan_model([edited_latent_code_s_batch], input_is_stylespace=True, randomize_noise=False, return_latents=True, weights_deltas=weight_deltas)

StyleGAN expects weight deltas which are not returned by PTI. This is where we are stuck as these weight deltas are provided by Hyperstyle and not PTI.

yuval-alaluf commented 2 years ago

If you're running PTI you can simply pass None to the weight_deltas. They're None by default. The flow should be the following:

  1. Run inference using HyperStyle to obtain the weight deltas. Let's say this is stored in the variable weight_deltas.
  2. Fuse the weight deltas into the generator to obtain a new modified generator of type Generator. This can be done with the following script I attached below (Note: this is a bit of an old code and I haven't tested it, but I hope the idea is clear in case it does not work): fuse_deltas_and_generator.txt At this point, you should have your new modified Generator object. Save this using torch.save("my_modified_generator.pt", modified_generator).
  3. Now it's time to run PTI where your checkpoint path will point to "my_modified_generator.pt". I don't know what code you're using for running PTI, but you don't need to use the weight_deltas parameter in the Generator. You simply optimize the weights of your Generator object directly.

I hope this helps clarify the steps.

P.S. I am a bit confused by the code you added above. This seems like its the code for StyleCLIP. If your stylegan_model object is the output of PTI you don't need to pass weight_deltas to the call. You can simply omit this argument. This is because your stylegan_model is already the fully optimized version obtained with PTI.

jaingaurav1601 commented 2 years ago

Yeah, we are using StyleCLIP to edit the image latent given by PTI. For more clarification I will give you the input and output: Input Image: Input Image Background removed image: Removed background Hyperstyle-based Inversion: Hyperstyle Inversion Hyperstyle->PTI(only tuning e4e inversion removed) Inversion: Hyperstyle+PTI StyleCLIP edited image(with Weight deltas saved from Hyperstyle based inversion) : StyleCLIP edited image

If you see all the results it is clearly observed that the weight deltas saved from Hyperstyle are affecting the edited image a lot more than the image latent we obtained from PTI + Hyperstyle.

What we need is the StyleCLIP to get the weights from PTI tuned INversion and then edit the image. When we pass the weight_deltas to None the output edited image is very bad.

yuval-alaluf commented 2 years ago

From what I can tell, it seems like the StyleCLIP-edited image is simply the edited image you get from HyperStyle. It is as if the generator saved from PTI was simply not saved and passed to StyleCLIP? I'm copying the images over to here so it's easier to look at: merge_from_ofoct (18) Based on what you sent, the order of the images are (HyperStyle, Edited, PTI) Notice that the HyperStyle and edited images are very similar (other than the hair that is edited). You can see that the PTI image is a bit different (e.g., the ear on the right is a bit cut off). This leads to me believe that when editing with StyleCLIP, you're simply using the generator you got from HyperStyle instead of passing the generator your got from PTI.

Did you follow the flow I recommended above? If you follow the steps, you should be able to edit the images correctly after running both HyperStyle + PTI. As I mentioned after you finish running HyperStyle and saving the modified generator, you shouldn't be using the weight deltas anymore (even when editing with StyleCLIP).