jiwoogit / StyleID

[CVPR 2024 Highlight] Style Injection in Diffusion: A Training-free Approach for Adapting Large-scale Diffusion Models for Style Transfer
MIT License
211 stars 12 forks source link

Diffusers implementation outputs all blurred images #15

Closed dibbla closed 2 months ago

dibbla commented 2 months ago

Hi @jiwoogit & @hse1032,

Thanks for open-sourcing this great work! I am working with the diffusers library and using base model SD2.1 from huggingface.

However, when using the default setting and the result does not seem good. image

Is there any problems related to SD2.1 or the diffusers implementation? Any suggestion would be appreciated 😄

hse1032 commented 2 months ago

Hi @dibbla,

Can you let us know the detailed configuration you run?

With the default configuration settings, we validate the current code works well.

Here is an example I run and the configuration settings.

Configurations

Namespace(T=1.5, cnt_fn='lenna.png', ddim_steps=20, gamma=0.75, layers=[7, 8, 9, 10, 11], save_dir='results', sd_version=2.1, sty_fn='the_starry_night.png', without_attn_injection=False, without_init_adain=False)

Images (cnt: lenna.png / sty: the_starry_night.png)

reverse_stylized

stylized_image

dibbla commented 2 months ago

Hi @hse1032. Thanks for the reply ⭐

Here is my configuration:

cfg is Namespace(T=1.5, gamma=0.75, without_init_adain=False, without_attn_injection=False, layers=[7, 8, 9, 10, 11], ddim_steps=20, sd_version=2.1, cnt_fn='data_vis/cnt/lenna.png', sty_fn='data_vis/sty/the_starry_night.png', save_dir='results')

I am using this code to run:

python run_styleid_diffusers.py --cnt_fn <some path>data_vis/cnt/lenna.png --sty_fn <some path>/data_vis/sty/the_starry_night.png --gamma 0.75 --T 1.5 --ddim_steps 20

I modified some lines to use my local model weights but the weights are downloaded from the huggingface directly, so I guess it should be fine.https://github.com/jiwoogit/StyleID/blob/d88ad3dcc5ebc185e5901454b288852ef5af5fa8/diffusers_implementation/stable_diffusion.py#L11-L12

It is changed to:

    if sd_version == '2.1':
        model_key = "/root/ws/style/stable-diffusion-2-1"
hse1032 commented 2 months ago

Can you try using the model in the link below? https://huggingface.co/stabilityai/stable-diffusion-2-1-base

liylo commented 2 months ago

I encounter a similar problem with @dibbla. I changed the model key into ""stabilityai/stable-diffusion-2-1" and got blur images. However, when I used the model in the original code, I got the excellent picture as @hse1032 provided. I wonder what causes the difference. Thanks!

dibbla commented 2 months ago

Can you try using the model in the link below? https://huggingface.co/stabilityai/stable-diffusion-2-1-base

Yes, I am using the model pulled from exactly this url

dibbla commented 2 months ago

I encounter a similar problem with @dibbla. I changed the model key into ""stabilityai/stable-diffusion-2-1" and got blur images. However, when I used the model in the original code, I got the excellent picture as @hse1032 provided. I wonder what causes the difference. Thanks!

Do you mean using Stable Diffusion v1 instead of v2.1?

hse1032 commented 2 months ago

Can you try using the model in the link below? https://huggingface.co/stabilityai/stable-diffusion-2-1-base

Yes, I am using the model pulled from exactly this url

SD v2.1-base (link we attach) is the model for 512-resolution images, and we found the current codes work well with this model. SD v2.1 is the model for 768-resolution images, and in this case, it seems there are some problems in current inversion code.

I think we need some debugging to make SD v2.1 model work.

liylo commented 2 months ago

Thank you! I'm trying on it.

dibbla commented 2 months ago

Can you try using the model in the link below? https://huggingface.co/stabilityai/stable-diffusion-2-1-base

Yes, I am using the model pulled from exactly this url

SD v2.1-base (link we attach) is the model for 512-resolution images, and we found the current codes work well with this model. SD v2.1 is the model for 768-resolution images, and in this case, it seems there are some problems in current inversion code.

I think we need some debugging to make SD v2.1 model work.

Oh, I see the differences! Thanks for pointing this out. Pulling models now and give it a try.

Btw, do you guys have any plans to support SDXL? I guess that would be more impactful than SDv2.1.

liylo commented 2 months ago

Can you try using the model in the link below? https://huggingface.co/stabilityai/stable-diffusion-2-1-base

Yes, I am using the model pulled from exactly this url

SD v2.1-base (link we attach) is the model for 512-resolution images, and we found the current codes work well with this model. SD v2.1 is the model for 768-resolution images, and in this case, it seems there are some problems in current inversion code. I think we need some debugging to make SD v2.1 model work.

Oh, I see the differences! Thanks for pointing this out. Pulling models now and give it a try.

Btw, do you guys have any plans to support SDXL? I guess that would be more impactful than SDv2.1.

I suppose SDXL requries much more memories and make it difficult to save all the features. But maybe we can save only the latents at each timestep, then make reverse call again to get style feature/content feature at the same time, as implemented in
https://github.com/google/style-aligned/.

liylo commented 2 months ago

I think I make sd21 work. There does exist some problem with inversion process, but I can't debug it. So I use https://github.com/shaibagon/diffusers_ddim_inversion/blob/main/ddim_inversion.py as an alternative. My code need to modify source code in diffusers, which isn't elegant and need a better implementation. Many thanks to authors and this amazing work.

dibbla commented 2 months ago

Hi guys, after changing base model from v2.1 to v2.1-base, I have achieved the same result as @hse1032 presented. Many thanks for the help❤️! @hse1032 @liylo.

I will try to test/investigate the ddim inversion problem with the v2.1 model. Closing this issue as the major concern has been resolved.