[SD3] pipe.enable_xformers_memory_efficient_attention

huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.

https://huggingface.co/docs/diffusers

Apache License 2.0

23.93k stars 4.93k forks source link

[SD3] pipe.enable_xformers_memory_efficient_attention #8535

Open CanvaChen opened 3 weeks ago

CanvaChen commented 3 weeks ago

Describe the bug

RuntimeError: The size of tensor a (154) must match the size of tensor b (2304) at non-singleton dimension 1

Reproduction

# StableDiffusion3Pipeline
pipe.enable_xformers_memory_efficient_attention()

Logs

No response

System Info

diffusers==0.29.0 python=3.10 pytorch=2.3

Who can help?

@yiyixuxu @DN6 @sayakpaul

sayakpaul commented 3 weeks ago

I don’t think we allow xformers attention in the SD3 blocks. Would you be interested in opening a PR? We will be happy to guide you.

CanvaChen commented 3 weeks ago

It might be a bit challenging for me as my understanding of xformers is currently at the application level.

sayakpaul commented 2 weeks ago

Oh that is okay. Here are a couple of reference pointers for you:

This is the standard xformers attention processor class: https://github.com/huggingface/diffusers/blob/f96e4a16adb4c31bab4c0a3d0d145ed2b086ecb0/src/diffusers/models/attention_processor.py#L1312
So, similarly, you would have to implement one for the joint attention processor block. More specifically, you will have to use ops from xformers in place of the native PyTorch ops here: https://github.com/huggingface/diffusers/blob/f96e4a16adb4c31bab4c0a3d0d145ed2b086ecb0/src/diffusers/models/attention_processor.py#L1135 (making sure the dimensions satisfy the criteria of xformers).

Does this seem like a feature you would be interested in contributing? Not only we would greatly appreciate it but also help you by providing guidance #8276

Let us know.

CanvaChen commented 2 weeks ago

Thank you for your guidance. I’m interested in attempting this and plan to work on it during my free time after work.

sayakpaul commented 2 weeks ago

That would be great! As mentioned we will be more than happy to guide you throughout.

CanvaChen commented 2 weeks ago

I have implemented the XFormersJointAttnProcessor, but after calling pipe.enable_xformers_memory_efficient_attention(), self.attn.processor is always set to XFormersAttnProcessor. Where should I configure to select the correct processor?
Since the first issue has not been resolved, I did not call pipe.enable_xformers_memory_efficient_attention() and temporarily changed processor = JointAttnProcessor2_0() to processor = XFormersJointAttnProcessor() directly in the JointTransformerBlock for testing. I found that the attention_mask passed to the Processor is None. Why is this happening?

sayakpaul commented 2 weeks ago

Thanks for your updates!

Do you want to open a PR with your implementation and tag myself and @yiyixuxu there?

It is perfectly okay to have it in an incomplete state.

CanvaChen commented 2 weeks ago

I’ll submit the PR after I’ve finished testing to make sure there are no issues. Currently, the attention_mask parameter is None, and I’m not sure if this is a problem. Could you help clarify my concerns in the two points above?

sayakpaul commented 2 weeks ago

I think None mask param value is fine.

Here is an example of how you can set the right processor: https://github.com/huggingface/diffusers/blob/a899e42fc78fbd080452ce88d00dbf704d115280/src/diffusers/models/attention_processor.py#L381

CanvaChen commented 2 weeks ago

I have already opened a PR. Currently, the attention_mask is None, so I am not handling the attention_mask for now.