wty-ustc / HairCLIPv2

[ICCV 2023] HairCLIPv2: Unifying Hair Editing via Proxy Feature Blending
169 stars 18 forks source link

The wild image effect is very poor, And very slowly。 #3

Closed garbe-github-support closed 6 months ago

garbe-github-support commented 6 months ago

I referred to readme for many configurations and dealt with various bugs, The image has been changed to 1024 * 1024。

Does the provided image need to be centered on the front face in order to be reconstructed?

Using your provide image(Generate NPZ simultaneously instead of using preset ones) can reconstruct the bald image of the original image very well, like this image

but if I use a new image, the newly created image is completely incorrect and very ugly。Like this image image

If My configuration is correct, I just want to say, the overly boastful paper, it wastes my time。The effect and speed of using stable diffusion are much better than yours

wty-ustc commented 6 months ago

These results look abnormal. This has not happened in all my test examples including wild images. test_images/unaligned_img/test.jpg is a random wild image I downloaded from www.pexels.com and it works fine. Here are some issues you need to be aware of:

  1. Wild images need to be aligned to the form needed by StyleGAN using python scripts/align_face.py.
  2. Front faces or faces with some angulation are permitted.
  3. Wild images better look crisp , they don't necessarily need to be 1024 resolution, a little lower is okay. The alignment script will automatically resize to 1024.

Perhaps there may be some other cases that I haven't mentioned. Can you send me your test image? I will help you to reproduce it after January 2, 2024 (right now I'm on a new year vacation). To protect the privacy of your images, you can send them to my email bestwty@mail.ustc.edu.cn.

Also, as far as speed is concerned, the main time overhead of our approach is in the inversion phase. You can choose to speed it up by reducing W_steps inside utils/options.py, at the cost of some degree of reconstruction loss, of course.

Compared to hair editing based on stable diffusion inpainting, our approach is a multi-modal hair editing system with additional support for hair transfer, sketch, mask, RGB and other interaction modes. Also, for text-based editing, I consider our approach to be comparable to stable diffusion in terms of effectiveness. If the StyleGAN is trained on a dataset with more diverse hairstyles, I think our method will perform even better.

123456klk1 commented 6 months ago

The process is to run 200 rounds, how long does it take? 3090 can run?

wty-ustc commented 6 months ago

Maybe 1 minute. You can reduce the number to 100 for most cases. As shown below, 60 rounds are enough for bowl cut hairstyle. Besides, 3090 is suitable for our project. image