huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
26.07k stars 5.37k forks source link

FLUX's inference speed is so slow #9095

Closed chenbinghui1 closed 1 week ago

chenbinghui1 commented 3 months ago

I run FLUX1.-dev on V100-32G GPU card, the inference code is like: ` pipe = FluxPipeline.from_pretrained("checkpoints/FLUX.1-dev", torch_dtype=torch.bfloat16, low_cpu_mem_usage=True) pipe.enable_model_cpu_offload()

image = pipe( prompt, height=1024, width=1024, guidance_scale=3.5, output_type="np", num_inference_steps=50, max_sequence_length=512, generator=generator ).images `

the inference time costs nearly 7minutes: image

is this normal? Does anyone meet this ?

latentCall145 commented 3 months ago

V100 Tensor Cores don't support bfloat16. Try casting to torch.float16 and try again. (note: I just made a PR to fix FP16 inference, you may need to install my diffusers fork if it's not merged)

chenbinghui1 commented 3 months ago

@latentCall145 Thanks for your PR, the inference time can be reduced to nearly 90s, and the image quality seems right. BTW when using the same prompt and seed, the output image is different compared to bfloat16.

sayakpaul commented 3 months ago

Precision would change the result, I would assume.

bghira commented 3 months ago

for some models it changes it much more than others. sometimes for the better!

latentCall145 commented 3 months ago

Precision would change the result, I would assume.

It's a dynamic range issue, not a precision issue. It's discussed in my PR but I'm also stating it here in case they haven't seen it

github-actions[bot] commented 1 month ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

a-r-r-o-w commented 1 week ago

I believe this has been addressed, yes? If not, please feel free to re-open