Stable Diffusion img2img: DPMSolverSinglestepScheduler does not work with certain strength values

cadaeix commented 1 year ago

Describe the bug

When using StableDiffusionImg2ImgPipeline, the DPMSolverSinglestepScheduler schedule, with certain img2img strength values, produces an error resulting from previous model outputs not being available.

Settings:

Inference steps: 15 Strength: 0.1, 0.3, 0.4, 0.45, 0.55, 0.7, 0.8, 0.95

Excerpt from the stack trace:

File diffusers\schedulers\scheduling_dpmsolver_singlestep.py:553, in DPMSolverSinglestepScheduler.step(self, model_output, timestep, sample, return_dict)
    550     self.sample = sample
    552 timestep_list = [self.timesteps[step_index - i] for i in range(order - 1, 0, -1)] + [timestep]
--> 553 prev_sample = self.singlestep_dpm_solver_update(
    554     self.model_outputs, timestep_list, prev_timestep, self.sample, order
    555 )
    557 if not return_dict:
    558     return (prev_sample,)

File diffusers\schedulers\scheduling_dpmsolver_singlestep.py:496, in DPMSolverSinglestepScheduler.singlestep_dpm_solver_update(self, model_output_list, timestep_list, prev_timestep, sample, order)
    494     return self.dpm_solver_first_order_update(model_output_list[-1], timestep_list[-1], prev_timestep, sample)
    495 elif order == 2:
--> 496     return self.singlestep_dpm_solver_second_order_update(
    497         model_output_list, timestep_list, prev_timestep, sample
    498     )
    499 elif order == 3:
    500     return self.singlestep_dpm_solver_third_order_update(
    501         model_output_list, timestep_list, prev_timestep, sample
    502     )

File diffusers\schedulers\scheduling_dpmsolver_singlestep.py:367, in DPMSolverSinglestepScheduler.singlestep_dpm_solver_second_order_update(self, model_output_list, timestep_list, prev_timestep, sample)
    365 h, h_0 = lambda_t - lambda_s1, lambda_s0 - lambda_s1
    366 r0 = h_0 / h
--> 367 D0, D1 = m1, (1.0 / r0) * (m0 - m1)
    368 if self.config.algorithm_type == "dpmsolver++":
    369     # See https://arxiv.org/abs/2211.01095 for detailed derivations
    370     if self.config.solver_type == "midpoint":

TypeError: unsupported operand type(s) for -: 'Tensor' and 'NoneType'

Reproduction

import requests
import numpy as np
from PIL import Image
from io import BytesIO

from diffusers import StableDiffusionImg2ImgPipeline, DPMSolverSinglestepScheduler

device = "cuda"
pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
).to(device)

pipe.scheduler = DPMSolverSinglestepScheduler()

url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
response = requests.get(url)
init_image = Image.open(BytesIO(response.content)).convert("RGB")
init_image = init_image.resize((768, 512))

for x in np.arange(0.0, 1.0, 0.05):
    strength = float("{:.2f}".format(x))
    try:
        images = pipe(
            prompt="a fantasy landscape",
            negative_prompt=None,
            image=init_image,
            strength=strength,
            num_inference_steps=15,
            guidance_scale=10,
            num_images_per_prompt=1
        ).images
        image = images[0].save(f"fantasy_landscape_{strength}.png")
        print(
            f"fantasy_landscape_{strength}.png saved at img2img strength {strength}")
    except Exception as e:
        print(
            f"fantasy_landscape_{strength}.png at img2img strength {strength} failed with error:\n {e} ")

Logs

Note: First two strength values tested, 0.0 and 0.05, fail because of a different issue (#1867)

Fetching 15 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 4980.96it/s]
0it [00:00, ?it/s]
fantasy_landscape_0.0.png at img2img strength 0.0 failed with error:
 list index out of range
0it [00:00, ?it/s]
fantasy_landscape_0.05.png at img2img strength 0.05 failed with error:
 list index out of range
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  2.39it/s]
fantasy_landscape_0.1.png saved at img2img strength 0.1
  0%|                                                                                                                                                                                                        | 0/2 [00:00<?, ?it/s]
fantasy_landscape_0.15.png at img2img strength 0.15 failed with error:
 unsupported operand type(s) for -: 'Tensor' and 'NoneType'
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00,  3.64it/s]
fantasy_landscape_0.2.png saved at img2img strength 0.2
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00,  3.79it/s]
fantasy_landscape_0.25.png saved at img2img strength 0.25
  0%|                                                                                                                                                                                                        | 0/4 [00:00<?, ?it/s]
fantasy_landscape_0.3.png at img2img strength 0.3 failed with error:
 unsupported operand type(s) for -: 'Tensor' and 'NoneType'
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:01<00:00,  3.96it/s]
fantasy_landscape_0.35.png saved at img2img strength 0.35
  0%|                                                                                                                                                                                                        | 0/6 [00:00<?, ?it/s]
fantasy_landscape_0.4.png at img2img strength 0.4 failed with error:
 unsupported operand type(s) for -: 'Tensor' and 'NoneType'
  0%|                                                                                                                                                                                                        | 0/6 [00:00<?, ?it/s]
fantasy_landscape_0.45.png at img2img strength 0.45 failed with error:
 unsupported operand type(s) for -: 'Tensor' and 'NoneType'
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:01<00:00,  3.96it/s]
fantasy_landscape_0.5.png saved at img2img strength 0.5
  0%|                                                                                                                                                                                                        | 0/8 [00:00<?, ?it/s]
fantasy_landscape_0.55.png at img2img strength 0.55 failed with error:
 unsupported operand type(s) for -: 'Tensor' and 'NoneType'
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:02<00:00,  4.11it/s]
fantasy_landscape_0.6.png saved at img2img strength 0.6
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:02<00:00,  3.91it/s]
fantasy_landscape_0.65.png saved at img2img strength 0.65
  0%|                                                                                                                                                                                                       | 0/10 [00:00<?, ?it/s]
fantasy_landscape_0.7.png at img2img strength 0.7 failed with error:
 unsupported operand type(s) for -: 'Tensor' and 'NoneType'
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:02<00:00,  4.12it/s]
fantasy_landscape_0.75.png saved at img2img strength 0.75
  0%|                                                                                                                                                                                                       | 0/12 [00:00<?, ?it/s]
fantasy_landscape_0.8.png at img2img strength 0.8 failed with error:
 unsupported operand type(s) for -: 'Tensor' and 'NoneType'
  0%|                                                                                                                                                                                                       | 0/12 [00:00<?, ?it/s]
fantasy_landscape_0.85.png at img2img strength 0.85 failed with error:
 unsupported operand type(s) for -: 'Tensor' and 'NoneType'
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 13/13 [00:03<00:00,  4.14it/s]
fantasy_landscape_0.9.png saved at img2img strength 0.9
  0%|                                                                                                                                                                                                       | 0/14 [00:00<?, ?it/s]
fantasy_landscape_0.95.png at img2img strength 0.95 failed with error:
 unsupported operand type(s) for -: 'Tensor' and 'NoneType'

System Info

diffusers version: 0.11.1
Platform: Windows-10-10.0.19045-SP0
Python version: 3.10.8
PyTorch version (GPU?): 1.13.1+cu116 (True)
Huggingface_hub version: 0.11.1
Transformers version: 4.25.1
Using GPU in script?: Yes, RTX3090
Using distributed or parallel set-up in script?: No

patrickvonplaten commented 1 year ago

Thanks for the issue @cadaeix,

I can reproduce it. Here a simple repro based on your example:

import requests
import numpy as np
from PIL import Image
from io import BytesIO

from diffusers import StableDiffusionImg2ImgPipeline, DPMSolverSinglestepScheduler

device = "cuda"
pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
).to(device)

pipe.scheduler = DPMSolverSinglestepScheduler.from_config(pipe.scheduler.config)

url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
response = requests.get(url)
init_image = Image.open(BytesIO(response.content)).convert("RGB")
init_image = init_image.resize((768, 512))

strength = 0.15
images = pipe(
    prompt="a fantasy landscape",
    negative_prompt=None,
    image=init_image,
    strength=strength,
    num_inference_steps=15,
    guidance_scale=10,
    num_images_per_prompt=1
).images
image = images[0].save(f"fantasy_landscape_{strength}.png")

This looks definitely like a bug and we should try to fix it. Also cc @LuChengTHU in case you have any idea what might be going on here.

@cadaeix just a quick tip, it's not recommend to do:

pipe.scheduler = DPMSolverSinglestepScheduler()

as this will load the wrong config instead you should do:

pipe.scheduler = DPMSolverSinglestepScheduler.from_config(pipe.scheduler.config)

patrickvonplaten commented 1 year ago

I'll try to allocate time to solve this soon except @LuChengTHU beats me to it :-)

patrickvonplaten commented 1 year ago

Haven't found time yet to look into it. Will try to do so soon.

patrickvonplaten commented 1 year ago

Still on my TODO - @williamberman could this be related to: https://github.com/huggingface/diffusers/pull/2969 ?

StAlKeR7779 commented 1 year ago

@patrickvonplaten I look at code and found some moments: 1) Wrong variable name in set_timesteps:

https://github.com/huggingface/diffusers/blob/v0.16.0/src/diffusers/schedulers/scheduling_dpmsolver_singlestep.py#L238 now:

self.orders = self.get_order_list(num_inference_steps)

correct:

self.order_list = self.get_order_list(num_inference_steps)

2) When we select timesteps for img2img we skip some https://github.com/huggingface/diffusers/blob/v0.16.0/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py#L522 Now first called timestep are not first scheduler timestep, due to this order for this timestep could be any value. But to work properly orders need to be called in order from 1 to self.config.solver_order.

I not sure how properly it should be fixed, so just leave info here: 1) limiting order by lastly executed order

order = min(order, self.max_prev_order + 1)
self.max_prev_order = max(self.max_prev_order, order)

or 2) by calculating order dynamically like:

def get_next_order(self, step_index):
    self.current_order += 1
    if self.config.lower_order_final and step_index == len(self.timesteps) - 1:
        # make it looks like [..., 3, 1, 2, 1]
        if self.config.solver_order == 3 and self.current_order == 3:
            self.current_order = 1

    if self.current_order > self.config.solver_order:
        self.current_order = 1

    return self.current_order

Upd: after some thoughts i think second option should be correct, so if no other thoughts will be here create a PR tomorrow

patrickvonplaten commented 1 year ago

Thanks a lot @StAlKeR7779, good catch!

Fixing the bug here: https://github.com/huggingface/diffusers/pull/3413

patrickvonplaten commented 1 year ago

Posting pictures of #3413 here:

0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95