huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
26.38k stars 5.43k forks source link

can not use custom_pipeline hd_painter with IP_adapter #8609

Open linthmwan opened 5 months ago

linthmwan commented 5 months ago

Describe the bug

When I use IP_adapter and hd_painter at the same time. it pop out the RuntimeError: mat1 and mat2 shapes cannot be multiplied (514x1280 and 1024x3072). It should be successful since both IP_adapter and hd_painter didn't modify the output shape of attention layers. From log, it seems that the whole computation mixed up both computation of IP_adapter and hd_painter? Maybe should do one first and then do the other?

Reproduction

pipe = StableDiffusionInpaintPipeline.from_pretrained(
    "./stable-diffusion/absolutereality_v181",
    use_safetensors=True, torch_dtype=torch.float16, safety_checker=None, custom_pipeline="hd_painter"
)

pipe.to("cuda")
pipe.load_ip_adapter("./", subfolder = "ip_adapter", weight_name="ip-adapter_sd15.bin")
prompt = "plastic bag"
init_image = load_image("./input/bag.jpg")
mask_image = load_image("./mask/bag.jpg")
ip_image = load_image("./input/plastic_bag.jpg")

image = pipe (prompt, init_image, mask_image, 
              ip_adapter_image = ip_image,
              use_rasg = True, use_painta = True, 
              num_inference_steps=30,
              generator=torch.manual_seed(12345)).images[0]

out = make_image_grid([init_image, mask_image, image], rows=1, cols=3)
out.save("hd.jpg")

Logs

Traceback (most recent call last):
  File "/home/loyd_wan/snap/snapd-desktop-integration/83/Desktop/IP-Adapter/hd_painter.py", line 41, in <module>
    image = pipe (prompt, init_image, mask_image, 
  File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/.cache/huggingface/modules/diffusers_modules/git/hd_painter.py", line 759, in __call__
    noise_pred = self.unet(
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/diffusers/models/unets/unet_2d_condition.py", line 1168, in forward
    encoder_hidden_states = self.process_encoder_hidden_states(
  File "/opt/conda/lib/python3.10/site-packages/diffusers/models/unets/unet_2d_condition.py", line 1039, in process_encoder_hidden_states
    image_embeds = self.encoder_hid_proj(image_embeds)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/diffusers/models/embeddings.py", line 1032, in forward
    image_embed = image_projection_layer(image_embed)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/diffusers/models/embeddings.py", line 457, in forward
    image_embeds = self.image_embeds(image_embeds)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 116, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (514x1280 and 1024x3072)

System Info

Diffusers==v0.28.0 python==3.10

Who can help?

No response

sayakpaul commented 5 months ago

Cc: @asomoza

sayakpaul commented 5 months ago

Also where does "stable-diffusion/absolutereality_v181" come from? We need to have access to this for reproduction.

asomoza commented 5 months ago

I won't have the time to test this soon.

cc: @fabiorigano for IP Adapter and @haikmanukyan for HD Painter if they have some insights about it.

linthmwan commented 5 months ago

reality_v181" come from

Use other SD 1.5 base model like runwayml/stable-diffusion-v1-5 also reproduce this error.

fabiorigano commented 5 months ago

hi everyone, thanks @asomoza for adding me here I made some tests, HD Painter code is not up to date with the more recent developments: in https://github.com/huggingface/diffusers/blob/35f45ecd71a5c917406408a02bc982c3795d5a35/examples/community/hd_painter.py#L572 the isinstance method will always return False, because now image projection layers are encapsulated in MultiIPAdapterImageProjection this is where code breaks, for reference it should be: https://github.com/huggingface/diffusers/blob/35f45ecd71a5c917406408a02bc982c3795d5a35/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py#L942C1-L949C14

github-actions[bot] commented 2 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.