Closed algorithmconquer closed 3 days ago
I had the same issue. It's the version compatibility problem. Try downgrading diffusers library and check again
@edkamesh what diffusers version need to downgrade?
Well, I don't think we have an xformers attention processor for Flux transformer here: https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py
well! Can we try to load using Diffusion Pipelines, I used it load my Quantized Flux model, It worked well
Can you show an example code snippet?
@sayakpaul The code is : pipe = FluxPipeline.from_pretrained(modelId, torch_dtype=torch.bfloat16) pipe.enable_xformers_memory_efficient_attention() pipe.to("cuda") image = pipe(prompt, num_inference_steps=50, width=1024, height=1024).images[0]
I am not sure how you can call that function on Flux as it doesn't implement that function in the first place: https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/flux/pipeline_flux.py
@sayakpaul The demo code is as the picture.When add "pipe.enable_xformers_memory_efficient_attention()", the code cannot run;
Yeah that is because the enable_xformers_memory_efficient_attention()
method isn't implemented for the FluxPipeline
which is exactly what I said in https://github.com/huggingface/diffusers/issues/9946#issuecomment-2482349337
@sayakpaul Thank you for your response.Can you add a PR to implement?
Sorry, I don't have the bandwidth to do that currently. However, I would like to point out that if you're using PyTorch 2.0 or greater versions, then we automatically use SDPA which should provide equivalent speedup.
@sayakpaul Thank you!
Just want to note that with Diffusers 1.0.0, we might be considering to remove any current xformers support based on a past discussion with @DN6. Using custom attention modules is already easy enough with model.set_attn_processor(CustomAttnProcessor())
, so users can already use any attention backend they want out of the box.
Umm removing might be a bit problematic as xformers is better in somecase when we're training. For example, in my experience (at least for SDXL), training xformers turned on has better memory footprints than SDPA.
Closing because the question was answered.
I think when the user call enable_xformers_memory_efficient_attention
on a pipeline that does not support xformer, we should throw a warning instead of switching it to the default xformer attention processor though,
basically, we need to check the attention processor type here too
will make a PR
That would be great, thanks!
@yiyixuxu Thank your for your response.That would be very great!
Describe the bug
The error is : python3.10/site-packages/diffusers/models/transformers/transformer_flux.py", line 180, in forward attn_output, context_attn_output = self.attn( ValueError: not enough values to unpack (expected 2, got 1)...
diffusers==0.32.0.dev0 torch==2.5.1 xformers==0.0.28.post3 transformers==4.46.2
Reproduction
Logs
No response
System Info
Ubuntu==20.04,Python==3.10.15
Who can help?
No response