DPM SDE fails when resolution changes after the first run

Disty0 commented 10 months ago

Describe the bug

DPM SDE fails to generate when resolution changes after the first run.

Resetting the entire scheduler or resetting the BrownianTreeNoiseSampler after every generation works as a workaround. But resetting after every generation kills the reproducibility.

As a 2nd method, changing this line in diffusers/schedulers/scheduling_dpmsolver_sde.py:480;

prev_sample = prev_sample + self.noise_sampler(sigma_fn(t), sigma_fn(t_next)) * s_noise * sigma_up

To;

try:
    prev_sample = prev_sample + self.noise_sampler(sigma_fn(t), sigma_fn(t_next)) * s_noise * sigma_up
except:
    min_sigma, max_sigma = self.sigmas[self.sigmas > 0].min(), self.sigmas.max()
    self.noise_sampler = BrownianTreeNoiseSampler(sample, min_sigma, max_sigma, self.noise_sampler_seed)
    prev_sample = prev_sample + self.noise_sampler(sigma_fn(t), sigma_fn(t_next)) * s_noise * sigma_up

Works as a workaround too. And this method doesn't have the reproducibility issues unless we change the resolution.

Reproduction

Generate at 1024x1536 first, then generate again at 1536x1536 using DPM SDE and SDXL.

Note: First part of the resolution matters. Changing from 1024x1536 to 1024x1024 will work fine.

Logs

/home/disty/Apps/AI/automatic/venv/lib/python3.11/site-packages/diffusers/schedulers/scheduling_dpmsolver_sde.py:4 │
│ 80 in step                                                                                                         │
│                                                                                                                    │
│   479 │   │   │   ).expm1() * pred_original_sample                                                                 │
│ ❱ 480 │   │   │   prev_sample = prev_sample + self.noise_sampler(sigma_fn(t), sigma_fn(t_next)                     │
│   481                                                                                                              │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: The size of tensor a (192) must match the size of tensor b (128) at non-singleton dimension 3

System Info

diffusers version: 0.20.2 (0.21.2 has this bug too)
Platform: Linux-6.5.4-arch2-1-x86_64-with-glibc2.38
Python version: 3.11.5
PyTorch version (GPU?): 2.0.1a0+cxx11.abi (False)
IPEX version: 2.0.110+xpu
Huggingface_hub version: 0.17.1
Transformers version: 4.31.0
Accelerate version: 0.20.3
xFormers version: not installed
Using GPU in script?: Intel ARC A770 16GB
Using distributed or parallel set-up in script?: No

Who can help?

No response

patrickvonplaten commented 10 months ago

Hey @Disty0,

Can you maybe add a reproducible code snippet here?

Disty0 commented 10 months ago

Forgot to add, I am using SDNext. SDNext currently has scheduler reset workaround implemented so remove these in modules/processing_diffusers.py to reproduce:

or (p.sampler_name == 'DPM SDE') (1x) or (p.latent_sampler == 'DPM SDE') (2x)

Find and replace with nothing should be able to reproduce.

https://github.com/vladmandic/automatic/blob/master/modules/processing_diffusers.py

patrickvonplaten commented 10 months ago

Hey @Disty0,

it would be very nice if you could try to reproduce the error by just using diffusers code. E.g. the following works just fine for me:

from diffusers import DPMSolverMultistepScheduler, StableDiffusionXLPipeline, DPMSolverSDEScheduler
import hf_image_uploader as hiu
import torch

path = "stabilityai/stable-diffusion-xl-base-1.0"
vae_path = "madebyollin/sdxl-vae-fp16-fix"

# vae = AutoencoderKL.from_pretrained(vae_path, torch_dtype=torch.float16)
# pipe = StableDiffusionXLPipeline.from_pretrained(path, torch_dtype=torch.float16, vae=vae, variant="fp16", use_safetensors=True, local_files_only=True, add_watermarker=False)
pipe = StableDiffusionXLPipeline.from_pretrained(path, torch_dtype=torch.float16, variant="fp16", use_safetensors=True)
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config, algorithm_type="sde-dpmsolver++")
pipe.to("cuda")

prompt = "An astronaut riding a green horse on Mars"
steps = 20

for i in range(2):
    width = 512 * (i + 1)
    height = 512 * (i + 1)
    image = pipe(prompt=prompt, width=width, height=height, num_inference_steps=steps).images[0]
    hiu.upload(image, "patrickvonplaten/images")

pipe.scheduler = DPMSolverSDEScheduler.from_config(pipe.scheduler.config, algorithm_type="sde-dpmsolver++")

for i in range(2):
    width = 512 * (i + 1)
    height = 512 * (i + 1)
    image = pipe(prompt=prompt, width=width, height=height, num_inference_steps=steps).images[0]
    hiu.upload(image, "patrickvonplaten/images")

Disty0 commented 10 months ago

Reproduced it with by using diffusers code too:

Logs:

disty:~ $ source /opt/intel/oneapi/setvars.sh 

:: initializing oneAPI environment ...
   bash: BASH_VERSION = 5.1.16(1)-release
   args: Using "$@" for setvars.sh arguments: 
:: advisor -- latest
:: ccl -- latest
:: compiler -- latest
:: dal -- latest
:: debugger -- latest
:: dev-utilities -- latest
:: dnnl -- latest
:: dpcpp-ct -- latest
:: dpl -- latest
:: ipp -- latest
:: ippcp -- latest
:: mkl -- latest
^[[A:: mpi -- latest
:: tbb -- latest
:: vtune -- latest
:: oneAPI environment initialized ::

disty:~ $ source Apps/AI/automatic/venv/bin/activate
(venv) disty:~ $ vim sa.py 
(venv) disty:~ $ python sa.py 
2023-09-26 17:03:30.272323: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-09-26 17:03:30.298777: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-09-26 17:03:30.299155: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-09-26 17:03:30.858888: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2023-09-26 17:03:31.688655: I itex/core/wrapper/itex_gpu_wrapper.cc:35] Intel Extension for Tensorflow* GPU backend is loaded.
2023-09-26 17:03:31.725695: W itex/core/ops/op_init.cc:58] Op: _QuantizedMaxPool3D is already registered in Tensorflow
2023-09-26 17:03:31.737222: I itex/core/devices/gpu/itex_gpu_runtime.cc:129] Selected platform: Intel(R) Level-Zero
2023-09-26 17:03:31.737397: I itex/core/devices/gpu/itex_gpu_runtime.cc:154] number of sub-devices is zero, expose root device.
/home/disty/Apps/AI/automatic/venv/lib/python3.11/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:06<00:00,  3.09it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:12<00:00,  1.57it/s]
/home/disty/Apps/AI/automatic/venv/lib/python3.11/site-packages/diffusers/configuration_utils.py:134: FutureWarning: Accessing config attribute `use_karras_sigmas` directly via 'DPMSolverSDEScheduler' object attribute is deprecated. Please access 'use_karras_sigmas' over 'DPMSolverSDEScheduler's config object instead, e.g. 'scheduler.config.use_karras_sigmas'.
  deprecate("direct config name access", "1.0.0", deprecation_message, standard_warn=False)
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:10<00:00,  1.83it/s]
  0%|                                                                                                                                                                                                                  | 0/20 [00:00<?, ?it/s]
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/disty/sa.py:71 in <module>                                                                 │
│                                                                                                  │
│   68 for i in range(2):                                                                          │
│   69 │   width = 512 * (i + 1)                                                                   │
│   70 │   height = 512 * (i + 1)                                                                  │
│ ❱ 71 │   image = pipe(prompt=prompt, width=width, height=height, num_inference_steps=steps).i    │
│   72 │   image.save(f"img/image1-{i}.jpg")                                                       │
│   73                                                                                             │
│                                                                                                  │
│ /home/disty/Apps/AI/automatic/venv/lib/python3.11/site-packages/torch/utils/_contextlib.py:115   │
│ in decorate_context                                                                              │
│                                                                                                  │
│   112 │   @functools.wraps(func)                                                                 │
│   113 │   def decorate_context(*args, **kwargs):                                                 │
│   114 │   │   with ctx_factory():                                                                │
│ ❱ 115 │   │   │   return func(*args, **kwargs)                                                   │
│   116 │                                                                                          │
│   117 │   return decorate_context                                                                │
│   118                                                                                            │
│                                                                                                  │
│ /home/disty/Apps/AI/automatic/venv/lib/python3.11/site-packages/diffusers/pipelines/stable_diffu │
│ sion_xl/pipeline_stable_diffusion_xl.py:851 in __call__                                          │
│                                                                                                  │
│   848 │   │   │   │   │   noise_pred = rescale_noise_cfg(noise_pred, noise_pred_text, guidance   │
│   849 │   │   │   │                                                                              │
│   850 │   │   │   │   # compute the previous noisy sample x_t -> x_t-1                           │
│ ❱ 851 │   │   │   │   latents = self.scheduler.step(noise_pred, t, latents, **extra_step_kwarg   │
│   852 │   │   │   │                                                                              │
│   853 │   │   │   │   # call the callback, if provided                                           │
│   854 │   │   │   │   if i == len(timesteps) - 1 or ((i + 1) > num_warmup_steps and (i + 1) %    │
│                                                                                                  │
│ /home/disty/Apps/AI/automatic/venv/lib/python3.11/site-packages/diffusers/schedulers/scheduling_ │
│ dpmsolver_sde.py:480 in step                                                                     │
│                                                                                                  │
│   477 │   │   │   prev_sample = (sigma_fn(ancestral_t) / sigma_fn(t)) * sample - (               │
│   478 │   │   │   │   t - ancestral_t                                                            │
│   479 │   │   │   ).expm1() * pred_original_sample                                               │
│ ❱ 480 │   │   │   prev_sample = prev_sample + self.noise_sampler(sigma_fn(t), sigma_fn(t_next)   │
│   481 │   │   │                                                                                  │
│   482 │   │   │   if self.state_in_first_order:                                                  │
│   483 │   │   │   │   # store for 2nd order step                                                 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: The size of tensor a (128) must match the size of tensor b (64) at non-singleton dimension

Code:

from diffusers import StableDiffusionXLPipeline, DPMSolverSDEScheduler, DPMSolverMultistepScheduler, StableDiffusionPipeline
import torch
import intel_extension_for_pytorch as ipex

original_from_numpy = torch.from_numpy
def from_numpy(ndarray):
    if ndarray.dtype == float:
        return original_from_numpy(ndarray.astype('float32'))
    else:
        return original_from_numpy(ndarray)
torch.from_numpy = from_numpy

original_generator = torch.Generator
def generator(device=None):
    if device is not None and device != torch.device("cpu") and device != "cpu":
        return torch.xpu.Generator(device)
    else:
        return original_generator(device)
torch.Generator = generator

original_autocast = torch.autocast
def ipex_autocast(*args, **kwargs):
    if len(args) > 0 and args[0] == "cuda" or args[0] == "xpu":
        return original_autocast("xpu", *args[1:], **kwargs)
    else:
        return original_autocast(*args, **kwargs)
torch.autocast = ipex_autocast

pipe = StableDiffusionXLPipeline.from_single_file("/home/disty/Apps/AI/automatic/models/Stable-diffusion/SDXL/infiniswissxl_v16.safetensors", torch_dtype=torch.bfloat16, use_safetensors=True)
#pipe = StableDiffusionPipeline.from_single_file("/home/disty/Apps/AI/automatic/models/Stable-diffusion/SD1.5/sotemix_v10.safetensors", torch_dtype=torch.bfloat16, use_safetensors=True)
pipe.to("xpu")

pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config, algorithm_type="sde-dpmsolver++")

prompt = "An astronaut riding a green horse on Mars"
steps = 20

for i in range(2):
    width = 512 * (i + 1)
    height = 512 * (i + 1)
    image = pipe(prompt=prompt, width=width, height=height, num_inference_steps=steps).images[0]
    image.save(f"img/image0-{i}.jpg")

pipe.scheduler = DPMSolverSDEScheduler.from_config(pipe.scheduler.config, algorithm_type="sde-dpmsolver++")

for i in range(2):
    width = 512 * (i + 1)
    height = 512 * (i + 1)
    image = pipe(prompt=prompt, width=width, height=height, num_inference_steps=steps).images[0]
    image.save(f"img/image1-{i}.jpg")

patrickvonplaten commented 10 months ago

Wait ok so the error only happens when using intel / ipex?

Disty0 commented 10 months ago

Happens with CPU too:

- `diffusers` version: 0.21.2
- Platform: Linux-6.5.4-arch2-1-x86_64-with-glibc2.38
- Python version: 3.11.5
- PyTorch version (GPU?): 2.1.0.dev20230726+cpu (False)
- Huggingface_hub version: 0.17.1
- Transformers version: 4.31.0
- Accelerate version: 0.20.3
- xFormers version: not installed
- Using GPU in script?: CPU Only
- Using distributed or parallel set-up in script?: No

Logs:

(venv) disty:~ $ python sa.py 
2023-09-27 16:13:49.199786: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-09-27 16:13:49.226442: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-09-27 16:13:49.226762: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-09-27 16:13:49.784371: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:10<00:00,  5.36s/it]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [01:08<00:00, 34.23s/it]
/home/disty/Apps/AI/openvino/automatic/venv/lib/python3.11/site-packages/diffusers/configuration_utils.py:134: FutureWarning: Accessing config attribute `use_karras_sigmas` directly via 'DPMSolverSDEScheduler' object attribute is deprecated. Please access 'use_karras_sigmas' over 'DPMSolverSDEScheduler's config object instead, e.g. 'scheduler.config.use_karras_sigmas'.
  deprecate("direct config name access", "1.0.0", deprecation_message, standard_warn=False)
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:14<00:00,  7.46s/it]
  0%|                                                                                                                                                                                                                   | 0/2 [00:32<?, ?it/s]
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/disty/sa.py:48 in <module>                                                                 │
│                                                                                                  │
│   45 for i in range(2):                                                                          │
│   46 │   width = 512 * (i + 1)                                                                   │
│   47 │   height = 512 * (i + 1)                                                                  │
│ ❱ 48 │   image = pipe(prompt=prompt, width=width, height=height, num_inference_steps=steps).i    │
│   49 │   image.save(f"img/image1-{i}.jpg")                                                       │
│   50                                                                                             │
│                                                                                                  │
│ /home/disty/Apps/AI/openvino/automatic/venv/lib/python3.11/site-packages/torch/utils/_contextlib │
│ .py:115 in decorate_context                                                                      │
│                                                                                                  │
│   112 │   @functools.wraps(func)                                                                 │
│   113 │   def decorate_context(*args, **kwargs):                                                 │
│   114 │   │   with ctx_factory():                                                                │
│ ❱ 115 │   │   │   return func(*args, **kwargs)                                                   │
│   116 │                                                                                          │
│   117 │   return decorate_context                                                                │
│   118                                                                                            │
│                                                                                                  │
│ /home/disty/Apps/AI/openvino/automatic/venv/lib/python3.11/site-packages/diffusers/pipelines/sta │
│ ble_diffusion_xl/pipeline_stable_diffusion_xl.py:851 in __call__                                 │
│                                                                                                  │
│   848 │   │   │   │   │   noise_pred = rescale_noise_cfg(noise_pred, noise_pred_text, guidance   │
│   849 │   │   │   │                                                                              │
│   850 │   │   │   │   # compute the previous noisy sample x_t -> x_t-1                           │
│ ❱ 851 │   │   │   │   latents = self.scheduler.step(noise_pred, t, latents, **extra_step_kwarg   │
│   852 │   │   │   │                                                                              │
│   853 │   │   │   │   # call the callback, if provided                                           │
│   854 │   │   │   │   if i == len(timesteps) - 1 or ((i + 1) > num_warmup_steps and (i + 1) %    │
│                                                                                                  │
│ /home/disty/Apps/AI/openvino/automatic/venv/lib/python3.11/site-packages/diffusers/schedulers/sc │
│ heduling_dpmsolver_sde.py:480 in step                                                            │
│                                                                                                  │
│   477 │   │   │   prev_sample = (sigma_fn(ancestral_t) / sigma_fn(t)) * sample - (               │
│   478 │   │   │   │   t - ancestral_t                                                            │
│   479 │   │   │   ).expm1() * pred_original_sample                                               │
│ ❱ 480 │   │   │   prev_sample = prev_sample + self.noise_sampler(sigma_fn(t), sigma_fn(t_next)   │
│   481 │   │   │                                                                                  │
│   482 │   │   │   if self.state_in_first_order:                                                  │
│   483 │   │   │   │   # store for 2nd order step                                                 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: The size of tensor a (128) must match the size of tensor b (64) at non-singleton dimension 3

Code:

from diffusers import StableDiffusionXLPipeline, DPMSolverSDEScheduler, DPMSolverMultistepScheduler, StableDiffusionPipeline
import torch
"""
import intel_extension_for_pytorch as ipex

original_from_numpy = torch.from_numpy
def from_numpy(ndarray):
    if ndarray.dtype == float:
        return original_from_numpy(ndarray.astype('float32'))
    else:
        return original_from_numpy(ndarray)
torch.from_numpy = from_numpy

original_generator = torch.Generator
def generator(device=None):
    if device is not None and device != torch.device("cpu") and device != "cpu":
        return torch.xpu.Generator(device)
    else:
        return original_generator(device)
torch.Generator = generator

original_manual_seed = torch.manual_seed
def manual_seed(*args, **kwargs):
    torch.xpu.manual_seed_all(*args, **kwargs)
    return original_manual_seed(*args, **kwargs)
torch.manual_seed = original_manual_seed
"""
pipe = StableDiffusionXLPipeline.from_single_file("/home/disty/Apps/AI/automatic/models/Stable-diffusion/SDXL/infiniswissxl_v16.safetensors", torch_dtype=torch.float32, use_safetensors=True)
#pipe = StableDiffusionPipeline.from_single_file("/home/disty/Apps/AI/automatic/models/Stable-diffusion/SD1.5/sotemix_v10.safetensors", torch_dtype=torch.bfloat16, use_safetensors=True)
pipe.to("cpu")

pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config, algorithm_type="sde-dpmsolver++")

prompt = "An astronaut riding a green horse on Mars"
steps = 2

for i in range(2):
    width = 512 * (i + 1)
    height = 512 * (i + 1)
    image = pipe(prompt=prompt, width=width, height=height, num_inference_steps=steps).images[0]
    image.save(f"img/image0-{i}.jpg")

pipe.scheduler = DPMSolverSDEScheduler.from_config(pipe.scheduler.config, algorithm_type="sde-dpmsolver++")

for i in range(2):
    width = 512 * (i + 1)
    height = 512 * (i + 1)
    image = pipe(prompt=prompt, width=width, height=height, num_inference_steps=steps).images[0]
    image.save(f"img/image1-{i}.jpg")

OlegRuban-ai commented 10 months ago

i have same problems too

patrickvonplaten commented 10 months ago

The same architecture just with smaller weights works fine for me on CPU as well:

from diffusers import DPMSolverMultistepScheduler, StableDiffusionXLPipeline, DPMSolverSDEScheduler

path = "hf-internal-testing/tiny-stable-diffusion-xl-pipe"

pipe = StableDiffusionXLPipeline.from_pretrained(path)
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config, algorithm_type="sde-dpmsolver++")

prompt = "An astronaut riding a green horse on Mars"
steps = 20

for i in range(2):
    width = 32 * (i + 1)
    height = 32 * (i + 1)
    image = pipe(prompt=prompt, width=width, height=height, num_inference_steps=steps).images[0]

pipe.scheduler = DPMSolverSDEScheduler.from_config(pipe.scheduler.config, algorithm_type="sde-dpmsolver++")

for i in range(2):
    width = 32 * (i + 1)
    height = 32 * (i + 1)
    image = pipe(prompt=prompt, width=width, height=height, num_inference_steps=steps).images[0]

Disty0 commented 10 months ago

Created new environment with standard PyTorch both on Python 3.10 and 3.11 but i still have this issue on CPU:

(venv) disty:~ $ python sa.py 
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00,  1.14it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00,  1.03it/s]
/home/disty/venv/lib/python3.10/site-packages/diffusers/configuration_utils.py:134: FutureWarning: Accessing config attribute `use_karras_sigmas` directly via 'DPMSolverSDEScheduler' object attribute is deprecated. Please access 'use_karras_sigmas' over 'DPMSolverSDEScheduler's config object instead, e.g. 'scheduler.config.use_karras_sigmas'.
  deprecate("direct config name access", "1.0.0", deprecation_message, standard_warn=False)
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:02<00:00,  1.43s/it]
  0%|                                                                                                                                                                                                                   | 0/2 [00:01<?, ?it/s]
Traceback (most recent call last):
  File "/home/disty/sa.py", line 48, in <module>
    image = pipe(prompt=prompt, width=width, height=height, num_inference_steps=steps).images[0]
  File "/home/disty/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/disty/venv/lib/python3.10/site-packages/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py", line 851, in __call__
    latents = self.scheduler.step(noise_pred, t, latents, **extra_step_kwargs, return_dict=False)[0]
  File "/home/disty/venv/lib/python3.10/site-packages/diffusers/schedulers/scheduling_dpmsolver_sde.py", line 480, in step
    prev_sample = prev_sample + self.noise_sampler(sigma_fn(t), sigma_fn(t_next)) * s_noise * sigma_up
RuntimeError: The size of tensor a (8) must match the size of tensor b (4) at non-singleton dimension 3

- `diffusers` version: 0.21.3
- Platform: Linux-6.5.5-arch1-1-x86_64-with-glibc2.38
- Python version: 3.10.13
- PyTorch version (GPU?): 2.0.1+cu117 (False)
- Huggingface_hub version: 0.17.3
- Transformers version: 4.33.3
- Accelerate version: not installed
- xFormers version: not installed
- Using GPU in script?: CPU Only
- Using distributed or parallel set-up in script?: No

- `diffusers` version: 0.21.3
- Platform: Linux-6.5.5-arch1-1-x86_64-with-glibc2.38
- Python version: 3.11.5
- PyTorch version (GPU?): 2.0.1+cu117 (False)
- Huggingface_hub version: 0.17.3
- Transformers version: 4.33.3
- Accelerate version: not installed
- xFormers version: not installed
- Using GPU in script?: CPU Only
- Using distributed or parallel set-up in script?: No

Ran the code you've sent without any changes:

(venv) disty:~ $ python tiny.py 
text_encoder_2/model.safetensors not found
Loading pipeline components...: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 20.35it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 77.66it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 36.38it/s]
/home/disty/venv/lib/python3.10/site-packages/diffusers/configuration_utils.py:134: FutureWarning: Accessing config attribute `use_karras_sigmas` directly via 'DPMSolverSDEScheduler' object attribute is deprecated. Please access 'use_karras_sigmas' over 'DPMSolverSDEScheduler's config object instead, e.g. 'scheduler.config.use_karras_sigmas'.
  deprecate("direct config name access", "1.0.0", deprecation_message, standard_warn=False)
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 32.90it/s]
  0%|                                                                                                                                                                                                                  | 0/20 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/disty/tiny.py", line 21, in <module>
    image = pipe(prompt=prompt, width=width, height=height, num_inference_steps=steps).images[0]
  File "/home/disty/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/disty/venv/lib/python3.10/site-packages/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py", line 851, in __call__
    latents = self.scheduler.step(noise_pred, t, latents, **extra_step_kwargs, return_dict=False)[0]
  File "/home/disty/venv/lib/python3.10/site-packages/diffusers/schedulers/scheduling_dpmsolver_sde.py", line 480, in step
    prev_sample = prev_sample + self.noise_sampler(sigma_fn(t), sigma_fn(t_next)) * s_noise * sigma_up
RuntimeError: The size of tensor a (32) must match the size of tensor b (16) at non-singleton dimension 3

- `diffusers` version: 0.21.3
- Platform: Linux-6.5.5-arch1-1-x86_64-with-glibc2.38
- Python version: 3.10.13
- PyTorch version (GPU?): 2.0.1+cu117 (False)
- Huggingface_hub version: 0.17.3
- Transformers version: 4.33.3
- Accelerate version: 0.23.0
- xFormers version: not installed
- Using GPU in script?: CPU Only
- Using distributed or parallel set-up in script?: No

Disty0 commented 10 months ago

Also i don't have any GPU other than Intel ARC A770 in my system.

patrickvonplaten commented 10 months ago

IS there any chance you could try to reproduce the problem in a google colab maybe? Sorry I don't manage to reproduce the bug here

Disty0 commented 10 months ago

https://colab.research.google.com/drive/1IQu2KKkfJCqG88B6emaCy0Z9r8SJeKUi?usp=sharing

patrickvonplaten commented 10 months ago

Thanks a lot for sharing the notebook, it seems like it's fixed on main no? I'm not getting an error when installing diffusers on main: https://colab.research.google.com/drive/143mjWsyLgg8zpsGE4ovn3cXYXVmLVv72?usp=sharing

Disty0 commented 10 months ago

Yep, doesn't seem to be happening in main.

huggingface / diffusers