huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
25.99k stars 5.35k forks source link

NAN values produced by SDXL VAE encoder #9844

Open YihanHu-2022 opened 2 hours ago

YihanHu-2022 commented 2 hours ago

Describe the bug

I'd like to use the SDXL VAE to encode my image, but only got NAN values. I have set the input and the vae to full precision (torch.float32), but problem still exists.

Reproduction

import torch
from diffusers import StableDiffusionXLPipeline
from diffusers import DPMSolverMultistepScheduler
import numpy as np
from PIL import Image
from torch import autocast, inference_mode

from PIL import Image
from torchvision import transforms as tr
p2t = tr.ToTensor()

device = torch.device('cuda') if torch.cuda.is_available() else torch.device(
    'cpu')
NUM_DDIM_STEPS = 50
SKIP = 0.0
ETA=1
TOTAL_STEP = int(NUM_DDIM_STEPS * (1 + SKIP))
model_id = 'stabilityai/stable-diffusion-xl-base-1.0'
ldm_stable = StableDiffusionXLPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to(device)
ldm_stable.scheduler = DPMSolverMultistepScheduler.from_config(model_id, subfolder = "scheduler", algorithm_type="sde-dpmsolver++", solver_order=2)
ldm_stable.scheduler.config.timestep_spacing = "leading"
ldm_stable.scheduler.set_timesteps(TOTAL_STEP)

image_gt = Image.open('path/to/image.png').convert('RGB')
image_gt = image_gt.resize((1024, 1024))
image_gt = p2t(image_gt) * 2 - 1
image_gt = image_gt.unsqueeze(0).to(device, dtype = torch.float32)

ldm_stable.vae.to(dtype=torch.float32)
with autocast("cuda"), inference_mode():
    w0 = ldm_stable.vae.encode(image_gt).latent_dist.sample()
    print(w0)

Logs

Loading pipeline components...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:09<00:00,  1.36s/it]
/root/miniforge3/lib/python3.10/site-packages/diffusers/configuration_utils.py:245: FutureWarning: It is deprecated to pass a pretrained model name or path to `from_config`.If you were trying to load a scheduler, please use <class 'diffusers.schedulers.scheduling_dpmsolver_multistep.DPMSolverMultistepScheduler'>.from_pretrained(...) instead. Otherwise, please make sure to pass a configuration dictionary instead. This functionality will be removed in v1.0.0.
  deprecate("config-passed-as-path", "1.0.0", deprecation_message, standard_warn=False)
tensor([[[[nan, nan, nan,  ..., nan, nan, nan],
          [nan, nan, nan,  ..., nan, nan, nan],
          [nan, nan, nan,  ..., nan, nan, nan],
          ...,
          [nan, nan, nan,  ..., nan, nan, nan],
          [nan, nan, nan,  ..., nan, nan, nan],
          [nan, nan, nan,  ..., nan, nan, nan]],

         [[nan, nan, nan,  ..., nan, nan, nan],
          [nan, nan, nan,  ..., nan, nan, nan],
          [nan, nan, nan,  ..., nan, nan, nan],
          ...,
          [nan, nan, nan,  ..., nan, nan, nan],
          [nan, nan, nan,  ..., nan, nan, nan],
          [nan, nan, nan,  ..., nan, nan, nan]],

         [[nan, nan, nan,  ..., nan, nan, nan],
          [nan, nan, nan,  ..., nan, nan, nan],
          [nan, nan, nan,  ..., nan, nan, nan],
          ...,
          [nan, nan, nan,  ..., nan, nan, nan],
          [nan, nan, nan,  ..., nan, nan, nan],
          [nan, nan, nan,  ..., nan, nan, nan]],

         [[nan, nan, nan,  ..., nan, nan, nan],
          [nan, nan, nan,  ..., nan, nan, nan],
          [nan, nan, nan,  ..., nan, nan, nan],
          ...,
          [nan, nan, nan,  ..., nan, nan, nan],
          [nan, nan, nan,  ..., nan, nan, nan],
          [nan, nan, nan,  ..., nan, nan, nan]]]], device='cuda:0')

System Info

Diffusers: 0.30.0 Pytorch: 1.12 transforms: 4.45.2 No XFormers

Running on RTX 3090Ti CUDA Version: 11.7

Python version 3.10.14

Who can help?

@yiyixuxu @sayakpaul @DN6

sayakpaul commented 2 hours ago

Could this be because of the scheduler you're using? Does this happen when you use the default scheduler?