huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
25.41k stars 5.26k forks source link

Pixart-sigma is incompatible with mixed precision inference #8612

Closed Luciennnnnnn closed 1 week ago

Luciennnnnnn commented 3 months ago

Describe the bug

I found Pixart-Sigma is incompatible with mixed precision inference, loading models both in float32 and in float16 have similar problem. I guess this problem may have some relationship with #8604

Reproduction

from diffusers import PixArtSigmaPipeline
import torch

from transformers import T5EncoderModel

text_encoder = T5EncoderModel.from_pretrained("PixArt-alpha/PixArt-Sigma-XL-2-1024-MS", subfolder="text_encoder", torch_dtype=torch.float16)

pipe = PixArtSigmaPipeline.from_pretrained(
    "PixArt-alpha/PixArt-Sigma-XL-2-1024-MS",
    text_encoder=text_encoder,
    torch_dtype=torch.float16
)

pipe = pipe.to("cuda")

prompts = "a space elevator, cinematic scifi art"

for idx, prompt in enumerate(prompts):
    with torch.autocast("cuda", enabled=True): # corrupted image
    # with torch.autocast("cuda", enabled=False): # no problem
        image = pipe(prompt=prompt, num_inference_steps=50, generator=torch.manual_seed(1)).images[0]
    image.save("x.png")

Logs

No response

System Info

🤗 Diffusers version: 0.29.0 Platform: Linux-5.15.0-60-generic-x86_64-with-glibc2.35 Running on a notebook?: No Running on Google Colab?: No Python version: 3.10.11 PyTorch version (GPU?): 2.1.2+cu118 (True) Flax version (CPU?/GPU?/TPU?): not installed (NA) Jax version: not installed JaxLib version: not installed Huggingface_hub version: 0.23.3 Transformers version: 4.41.2 Accelerate version: 0.23.0 PEFT version: 0.7.0 Bitsandbytes version: not installed Safetensors version: 0.4.2 xFormers version: 0.0.23.post1+cu118 Accelerator: NVIDIA A100-SXM4-80GB, 81920 MiB NVIDIA A100-SXM4-80GB, 81920 MiB NVIDIA A100-SXM4-80GB, 81920 MiB NVIDIA A100-SXM4-80GB, 81920 MiB NVIDIA A100-SXM4-80GB, 81920 MiB NVIDIA A100-SXM4-80GB, 81920 MiB NVIDIA A100-SXM4-80GB, 81920 MiB NVIDIA A100-SXM4-80GB, 81920 MiB VRAM Using GPU in script?: Using distributed or parallel set-up in script?:

Who can help?

@asomoza @yiyixuxu @sayakpaul

sayakpaul commented 3 months ago

Why do you have to put an autocast context here?

Luciennnnnnn commented 3 months ago

Why do you have to put an autocast context here?

I see some previous inference example have that, so I use it continuously since it will at least do not degrade performance when we load models in float16 (actually, I hope it will improve performance in some cases).

sayakpaul commented 3 months ago

Could you share an example? We don't want to put an autocast context when we explicitly do the type-casting.

Also, does this issue happen with PixArt Alpha as well?

github-actions[bot] commented 2 weeks ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

asomoza commented 1 week ago

closing this because of the lack of updates, feel free to ask to reopen if there's an update.