huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
23.99k stars 4.94k forks source link

can not use custom_pipeline hd_painter with IP_adapter #8609

Open linthmwan opened 3 weeks ago

linthmwan commented 3 weeks ago

Describe the bug

When I use IP_adapter and hd_painter at the same time. it pop out the RuntimeError: mat1 and mat2 shapes cannot be multiplied (514x1280 and 1024x3072). It should be successful since both IP_adapter and hd_painter didn't modify the output shape of attention layers. From log, it seems that the whole computation mixed up both computation of IP_adapter and hd_painter? Maybe should do one first and then do the other?

Reproduction

pipe = StableDiffusionInpaintPipeline.from_pretrained(
    "./stable-diffusion/absolutereality_v181",
    use_safetensors=True, torch_dtype=torch.float16, safety_checker=None, custom_pipeline="hd_painter"
)

pipe.to("cuda")
pipe.load_ip_adapter("./", subfolder = "ip_adapter", weight_name="ip-adapter_sd15.bin")
prompt = "plastic bag"
init_image = load_image("./input/bag.jpg")
mask_image = load_image("./mask/bag.jpg")
ip_image = load_image("./input/plastic_bag.jpg")

image = pipe (prompt, init_image, mask_image, 
              ip_adapter_image = ip_image,
              use_rasg = True, use_painta = True, 
              num_inference_steps=30,
              generator=torch.manual_seed(12345)).images[0]

out = make_image_grid([init_image, mask_image, image], rows=1, cols=3)
out.save("hd.jpg")

Logs

Traceback (most recent call last):
  File "/home/loyd_wan/snap/snapd-desktop-integration/83/Desktop/IP-Adapter/hd_painter.py", line 41, in <module>
    image = pipe (prompt, init_image, mask_image, 
  File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/.cache/huggingface/modules/diffusers_modules/git/hd_painter.py", line 759, in __call__
    noise_pred = self.unet(
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/diffusers/models/unets/unet_2d_condition.py", line 1168, in forward
    encoder_hidden_states = self.process_encoder_hidden_states(
  File "/opt/conda/lib/python3.10/site-packages/diffusers/models/unets/unet_2d_condition.py", line 1039, in process_encoder_hidden_states
    image_embeds = self.encoder_hid_proj(image_embeds)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/diffusers/models/embeddings.py", line 1032, in forward
    image_embed = image_projection_layer(image_embed)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/diffusers/models/embeddings.py", line 457, in forward
    image_embeds = self.image_embeds(image_embeds)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 116, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (514x1280 and 1024x3072)

System Info

Diffusers==v0.28.0 python==3.10

Who can help?

No response

sayakpaul commented 2 weeks ago

Cc: @asomoza

sayakpaul commented 2 weeks ago

Also where does "stable-diffusion/absolutereality_v181" come from? We need to have access to this for reproduction.

asomoza commented 2 weeks ago

I won't have the time to test this soon.

cc: @fabiorigano for IP Adapter and @haikmanukyan for HD Painter if they have some insights about it.

linthmwan commented 2 weeks ago

reality_v181" come from

Use other SD 1.5 base model like runwayml/stable-diffusion-v1-5 also reproduce this error.

fabiorigano commented 1 week ago

hi everyone, thanks @asomoza for adding me here I made some tests, HD Painter code is not up to date with the more recent developments: in https://github.com/huggingface/diffusers/blob/35f45ecd71a5c917406408a02bc982c3795d5a35/examples/community/hd_painter.py#L572 the isinstance method will always return False, because now image projection layers are encapsulated in MultiIPAdapterImageProjection this is where code breaks, for reference it should be: https://github.com/huggingface/diffusers/blob/35f45ecd71a5c917406408a02bc982c3795d5a35/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py#L942C1-L949C14