huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
25.24k stars 5.22k forks source link

Very Slow first inference with diffusers 0.27.X #7785

Open nesscube opened 4 months ago

nesscube commented 4 months ago

Describe the bug

Hello diffusers team ! I face an annoying issue since I upgraded the diffusers version to 0.27.X The first call (and only the first) of pipeline(...) takes now a lot of time before to start inference (like a minute) Moreover the call of compel(prompts) takes 30 seconds versus instant in 0.26.X

Thos slow down seems to happen only :

Unfortunately I need all of these for my project ..

thanks a lot for help !

Reproduction

from compel import Compel from diffusers import ( StableDiffusionXLPipeline )

pipeline = StableDiffusionXLPipeline.from_single_file( model_path, torch_dtype=torch.float16, local_files_only=True, use_safetensors=True, add_watermarker=False, original_config_file=model_config, vae=AutoencoderKL.from_pretrained(model_path_vae, torch_dtype=torch.float16) ) pipeline.enable_model_cpu_offload()

prompt_embeds, pooled_prompt_embeds = compel(prompts) negative_prompt_embeds, negative_pooled_prompt_embeds = compel(negative_prompts)

result = pipeline( prompt_embeds=prompt_embeds, pooled_prompt_embeds=pooled_prompt_embeds, negative_prompt_embeds=negative_prompt_embeds, negative_pooled_prompt_embeds=negative_pooled_prompt_embeds , width=width, height=height, num_inference_steps=num_inference_steps, guidance_scale=6, num_images_per_prompt=1, generator=torch.Generator(device='cuda').manual_seed(seed) )

Logs

No response

System Info

Who can help?

@yiyixuxu @sayakpaul @DN6

sayakpaul commented 4 months ago

Could you provide a reproducible snippet without Compel that demonstrates the inference slow down?

sayakpaul commented 4 months ago

Also, FWIW, we run benchmarking tests regularly and do automated reporting: https://huggingface.co/datasets/diffusers/benchmarks/tree/main. As we can see, there's no weird latency changes in the most commonly used pipelines.

lerignoux commented 4 months ago

Hello

@nesscube You were running this in WSL or Windows desktop right ? I managed to reproduce on my side but it seems to be linked to the model loading.

Reproduction:

# In Docker Desktop
docker run -it -v <windows_folder_path_with_model>/:/models/ python:3.10-slim bash
cd /models
pip install diffusers==0.27.2 torch transformers accelerate

Then run:


from datetime import datetime
import torch
from diffusers import StableDiffusionXLPipeline

model_path = "albedobond/albedobase-xl-v2.1.safetensors"
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
print(f"Loading pipeline: {datetime.utcnow()}"); pipeline = StableDiffusionXLPipeline.from_single_file(model_path, torch_dtype=torch.float16, local_files_only=True); print(f"Pipeline Loaded: {datetime.utcnow()}")
pipeline.enable_model_cpu_offload()

print(f"Generating: {datetime.utcnow()}"); image = pipeline(prompt=prompt).images[0]; print(f"Generated: {datetime.utcnow()}")

When mounting the model from a windows folder, I notice the from_single_file method is much faster, it returns nearly immediately. But then generation takes ages. I guess the model is just not in Ram so it runs from disk.|

@sayakpaul Do you know if there was any change with the model loading process in 0.27 ?

yiyixuxu commented 4 months ago

We had this PR https://github.com/huggingface/diffusers/pull/6994 - is this related?

lerignoux commented 4 months ago

We had this PR #6994 - is this related?

Yes nice one, Bisect confirmed your info. issues is brought by this commit

Tried to have a look today, but will need more time to see the actual issue deeper. Do you know if anyone familiar with it could help ?

github-actions[bot] commented 1 week ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.