huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
23.93k stars 4.93k forks source link

[SD3] pipe.enable_xformers_memory_efficient_attention #8535

Open CanvaChen opened 3 weeks ago

CanvaChen commented 3 weeks ago

Describe the bug

RuntimeError: The size of tensor a (154) must match the size of tensor b (2304) at non-singleton dimension 1

Reproduction

# StableDiffusion3Pipeline
pipe.enable_xformers_memory_efficient_attention()

Logs

No response

System Info

diffusers==0.29.0 python=3.10 pytorch=2.3

Who can help?

@yiyixuxu @DN6 @sayakpaul

sayakpaul commented 3 weeks ago

I don’t think we allow xformers attention in the SD3 blocks. Would you be interested in opening a PR? We will be happy to guide you.

CanvaChen commented 3 weeks ago

It might be a bit challenging for me as my understanding of xformers is currently at the application level.

sayakpaul commented 2 weeks ago

Oh that is okay. Here are a couple of reference pointers for you:

Does this seem like a feature you would be interested in contributing? Not only we would greatly appreciate it but also help you by providing guidance #8276

Let us know.

CanvaChen commented 2 weeks ago

Thank you for your guidance. I’m interested in attempting this and plan to work on it during my free time after work.

sayakpaul commented 2 weeks ago

That would be great! As mentioned we will be more than happy to guide you throughout.

CanvaChen commented 2 weeks ago
  1. I have implemented the XFormersJointAttnProcessor, but after calling pipe.enable_xformers_memory_efficient_attention(), self.attn.processor is always set to XFormersAttnProcessor. Where should I configure to select the correct processor?
  2. Since the first issue has not been resolved, I did not call pipe.enable_xformers_memory_efficient_attention() and temporarily changed processor = JointAttnProcessor2_0() to processor = XFormersJointAttnProcessor() directly in the JointTransformerBlock for testing. I found that the attention_mask passed to the Processor is None. Why is this happening?
sayakpaul commented 2 weeks ago

Thanks for your updates!

Do you want to open a PR with your implementation and tag myself and @yiyixuxu there?

It is perfectly okay to have it in an incomplete state.

CanvaChen commented 2 weeks ago

I’ll submit the PR after I’ve finished testing to make sure there are no issues. Currently, the attention_mask parameter is None, and I’m not sure if this is a problem. Could you help clarify my concerns in the two points above?

sayakpaul commented 2 weeks ago

I think None mask param value is fine.

Here is an example of how you can set the right processor: https://github.com/huggingface/diffusers/blob/a899e42fc78fbd080452ce88d00dbf704d115280/src/diffusers/models/attention_processor.py#L381

CanvaChen commented 2 weeks ago

I have already opened a PR. Currently, the attention_mask is None, so I am not handling the attention_mask for now.