xformers-enable_xformers_memory_efficient_attention

huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.

https://huggingface.co/docs/diffusers

Apache License 2.0

26.3k stars 5.42k forks source link

xformers-enable_xformers_memory_efficient_attention #9946

Closed algorithmconquer closed 3 days ago

algorithmconquer commented 4 days ago

Describe the bug

The error is : python3.10/site-packages/diffusers/models/transformers/transformer_flux.py", line 180, in forward attn_output, context_attn_output = self.attn( ValueError: not enough values to unpack (expected 2, got 1)...

diffusers==0.32.0.dev0 torch==2.5.1 xformers==0.0.28.post3 transformers==4.46.2

Reproduction

pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16)
pipe.enable_xformers_memory_efficient_attention()
pipe.to("cuda")
t_start = time.time()
image = pipe(prompt, num_inference_steps=50, width=1024, height=1024).images[0]

Logs

No response

System Info

Ubuntu==20.04,Python==3.10.15

Who can help?

No response

edkamesh commented 4 days ago

I had the same issue. It's the version compatibility problem. Try downgrading diffusers library and check again

algorithmconquer commented 4 days ago

@edkamesh what diffusers version need to downgrade?

sayakpaul commented 4 days ago

Well, I don't think we have an xformers attention processor for Flux transformer here: https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py

edkamesh commented 4 days ago

well! Can we try to load using Diffusion Pipelines, I used it load my Quantized Flux model, It worked well

sayakpaul commented 4 days ago

Can you show an example code snippet?

algorithmconquer commented 4 days ago

@sayakpaul The code is : pipe = FluxPipeline.from_pretrained(modelId, torch_dtype=torch.bfloat16) pipe.enable_xformers_memory_efficient_attention() pipe.to("cuda") image = pipe(prompt, num_inference_steps=50, width=1024, height=1024).images[0]

sayakpaul commented 4 days ago

I am not sure how you can call that function on Flux as it doesn't implement that function in the first place: https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/flux/pipeline_flux.py

algorithmconquer commented 4 days ago

@sayakpaul The demo code is as the picture.When add "pipe.enable_xformers_memory_efficient_attention()", the code cannot run;

sayakpaul commented 4 days ago

Yeah that is because the enable_xformers_memory_efficient_attention() method isn't implemented for the FluxPipeline which is exactly what I said in https://github.com/huggingface/diffusers/issues/9946#issuecomment-2482349337

algorithmconquer commented 4 days ago

@sayakpaul Thank you for your response.Can you add a PR to implement?

sayakpaul commented 4 days ago

Sorry, I don't have the bandwidth to do that currently. However, I would like to point out that if you're using PyTorch 2.0 or greater versions, then we automatically use SDPA which should provide equivalent speedup.

algorithmconquer commented 4 days ago

@sayakpaul Thank you!

a-r-r-o-w commented 4 days ago

Just want to note that with Diffusers 1.0.0, we might be considering to remove any current xformers support based on a past discussion with @DN6. Using custom attention modules is already easy enough with model.set_attn_processor(CustomAttnProcessor()), so users can already use any attention backend they want out of the box.

sayakpaul commented 3 days ago

Umm removing might be a bit problematic as xformers is better in somecase when we're training. For example, in my experience (at least for SDXL), training xformers turned on has better memory footprints than SDPA.

sayakpaul commented 3 days ago

Closing because the question was answered.

yiyixuxu commented 3 days ago

I think when the user call enable_xformers_memory_efficient_attention on a pipeline that does not support xformer, we should throw a warning instead of switching it to the default xformer attention processor though,

basically, we need to check the attention processor type here too

https://github.com/huggingface/diffusers/blob/acf479bded1ae9fc77b2dc7f316be099abf379ce/src/diffusers/models/attention_processor.py#L388

will make a PR

sayakpaul commented 3 days ago

That would be great, thanks!

algorithmconquer commented 3 days ago

@yiyixuxu Thank your for your response.That would be very great!