No non-default schedulers appear to work with DeepFloyd IF

AmericanPresidentJimmyCarter commented 1 year ago

Describe the bug

Attempting to use any non-default scheduler with DeepFloyd IF crashes out like this:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ <stdin>:1 in <module>                                                                            │
│                                                                                                  │
│ site-packages/torch/utils/_contextlib.py │
│ :115 in decorate_context                                                                         │
│                                                                                                  │
│   112 │   @functools.wraps(func)                                                                 │
│   113 │   def decorate_context(*args, **kwargs):                                                 │
│   114 │   │   with ctx_factory():                                                                │
│ ❱ 115 │   │   │   return func(*args, **kwargs)                                                   │
│   116 │                                                                                          │
│   117 │   return decorate_context                                                                │
│   118                                                                                            │
│                                                                                                  │
│ site-packages/diffusers/pipelines/deepfl │
│ oyd_if/pipeline_if.py:807 in __call__                                                            │
│                                                                                                  │
│   804 │   │   │   │   │   noise_pred = torch.cat([noise_pred, predicted_variance], dim=1)        │
│   805 │   │   │   │                                                                              │
│   806 │   │   │   │   # compute the previous noisy sample x_t -> x_t-1                           │
│ ❱ 807 │   │   │   │   intermediate_images = self.scheduler.step(                                 │
│   808 │   │   │   │   │   noise_pred, t, intermediate_images, **extra_step_kwargs                │
│   809 │   │   │   │   ).prev_sample                                                              │
│   810                                                                                            │
│                                                                                                  │
│ site-packages/diffusers/schedulers/sched │
│ uling_dpmsolver_multistep.py:549 in step                                                         │
│                                                                                                  │
│   546 │   │   │   (step_index == len(self.timesteps) - 2) and self.config.lower_order_final an   │
│   547 │   │   )                                                                                  │
│   548 │   │                                                                                      │
│ ❱ 549 │   │   model_output = self.convert_model_output(model_output, timestep, sample)           │
│   550 │   │   for i in range(self.config.solver_order - 1):                                      │
│   551 │   │   │   self.model_outputs[i] = self.model_outputs[i + 1]                              │
│   552 │   │   self.model_outputs[-1] = model_output                                              │
│                                                                                                  │
│ site-packages/diffusers/schedulers/sched │
│ uling_dpmsolver_multistep.py:327 in convert_model_output                                         │
│                                                                                                  │
│   324 │   │   if self.config.algorithm_type == "dpmsolver++":                                    │
│   325 │   │   │   if self.config.prediction_type == "epsilon":                                   │
│   326 │   │   │   │   alpha_t, sigma_t = self.alpha_t[timestep], self.sigma_t[timestep]          │
│ ❱ 327 │   │   │   │   x0_pred = (sample - sigma_t * model_output) / alpha_t                      │
│   328 │   │   │   elif self.config.prediction_type == "sample":                                  │
│   329 │   │   │   │   x0_pred = model_output                                                     │
│   330 │   │   │   elif self.config.prediction_type == "v_prediction":                            │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: The size of tensor a (3) must match the size of tensor b (6) at non-singleton dimension 1

Reproduction

>>> import torch                                                                                                                          
>>> from diffusers import DiffusionPipeline                                                                                               
>>> stage_1 = diffusers.DiffusionPipeline.from_pretrained("DeepFloyd/IF-I-XL-v1.0")                                                                                                                            
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████| 2/2 [00:12<00:00,  6.40s/it]
>>> stage_1.enable_model_cpu_offload()                                                                                                    
>>> prompt = 'a photo of a kangaroo wearing an orange hoodie and blue sunglasses standing in front of the eiffel tower holding a sign that says "very deep learning"'                                                                                                               
>>> prompt_embeds, negative_embeds = stage_1.encode_prompt(prompt)                                                                        
>>> generator = torch.manual_seed(0)                                                                                                      
>>> image = stage_1(prompt_embeds=prompt_embeds, negative_prompt_embeds=negative_embeds, generator=generator, output_type="pt").images    
100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:19<00:00,  5.05it/s]
>>> from diffusers import DPMSolverMultistepScheduler                                                                                             
>>> stage_1.register_modules(scheduler=DPMSolverMultistepScheduler()) 
>>> image = stage_1(prompt_embeds=prompt_embeds, negative_prompt_embeds=negative_embeds, generator=generator, output_type="pt").images
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
...

Logs

No response

System Info

py 3.10 diffusers v0.16.1

trygvebw commented 1 year ago

Adding something like the following piece of code to somewhere near the start (for DDIM and DPMSolverSinglestep schedulers just after prev_timestep is defined) of the __call__ function of each individual scheduler seems to fix this issue:

if model_output.shape[1] == sample.shape[1] * 2:
    model_output, _ = torch.split(model_output, sample.shape[1], dim=1)

AmericanPresidentJimmyCarter commented 1 year ago

Still does not work. Output is very distorted.

HXT3v0vuAgZR

patrickvonplaten commented 1 year ago

Yes we should try to get those working :heart_eyes: Think we have some first PRs here: https://github.com/huggingface/diffusers/pull/3314

LuChengTHU commented 1 year ago

Hi @AmericanPresidentJimmyCarter , Could you please give me your prompt and random seed for reproducing the distorted image?

AmericanPresidentJimmyCarter commented 1 year ago

That image was produced prior to your merge request.

And, actually, the issue seems to have been that the default parameters for the scheduler were problem. I am unsure why they are set the way they are, but even for SD DPMSolverMultistepScheduler() outputs complete garbage.

This configuration, by contrast, seems to produce normal images for SD.

scheduler = DPMSolverMultistepScheduler(
            beta_start=0.00085,
            beta_end=0.012,
            beta_schedule="scaled_linear",
            num_train_timesteps=1000,
            trained_betas=None,
            thresholding=False,
            algorithm_type="dpmsolver++",
            solver_type="midpoint",
            lower_order_final=True,
        )

I am unsure why I needed to Google this to be able to use this scheduler at all -- it seems like maybe there should be a class method like preset_for where you can select the right kwargs for your respective model? And, where do I find the kwargs for IF?

patrickvonplaten commented 1 year ago

@AmericanPresidentJimmyCarter you can easily load the correct scheduler config as shown here: https://github.com/huggingface/diffusers/pull/3314#issuecomment-1533036833

Just doing:

DPMSolverMultistepScheduler.from_pretrained("DeepFloyd/IF-I-XL-v1.0")

gives you the correct config. Why do you want to instead use the default settings?

AmericanPresidentJimmyCarter commented 1 year ago

Ah, okay -- I did not realise that this function already existed, thanks.

huggingface / diffusers