huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
25.43k stars 5.27k forks source link

Implement: Align Your Steps: Optimizing Sampling Schedules in Diffusion Models #7760

Open joe-aivatarz opened 5 months ago

joe-aivatarz commented 5 months ago

Align Your Steps: Optimizing Sampling Schedules in Diffusion Models is a general and principled approach to optimizing the sampling schedules of DMs for high-quality outputs. This work is presented by Nvidia labs in this paper: https://arxiv.org/abs/2404.14507

and the project page is here https://research.nvidia.com/labs/toronto-ai/AlignYourSteps/

the page propose a very small change that have big impact on the inference quality

DN6 commented 5 months ago

Hmm the results do look good. But from the project page they mention

We leverage methods from stochastic calculus and find optimal schedules specific to different solvers, trained DMs and datasets.

So each scheduler would have it's own set of aligned steps. We can support this by allowing setting the timesteps directly for all schedulers.

@yiyixuxu wdyt? The Colab notebook example the authors have provided uses diffusers and modifies the DPM scheduler

from diffusers import DPMSolverMultistepScheduler as DefaultDPMSolver

# Add support for setting custom timesteps
class DPMSolverMultistepScheduler(DefaultDPMSolver):
    def set_timesteps(
        self, num_inference_steps=None, device=None,
        timesteps=None
    ):
        if timesteps is None:
            super().set_timesteps(num_inference_steps, device)
            return

        all_sigmas = np.array(((1 - self.alphas_cumprod) / self.alphas_cumprod) ** 0.5)
        self.sigmas = torch.from_numpy(all_sigmas[timesteps])
        self.timesteps = torch.tensor(timesteps[:-1]).to(device=device, dtype=torch.int64) # Ignore the last 0

        self.num_inference_steps = len(timesteps)

        self.model_outputs = [
            None,
        ] * self.config.solver_order
        self.lower_order_nums = 0

        # add an index counter for schedulers that allow duplicated timesteps
        self._step_index = None
        self._begin_index = None
        self.sigmas = self.sigmas.to("cpu")  # to avoid too much CPU/GPU communication

Colab: https://colab.research.google.com/drive/1cIwbbO4HRP1aUQ8WcbQBaT8p3868k7BC?usp=sharing#scrollTo=X8WJXsHUV96k

chuck-ma commented 5 months ago

I'm wondering how to modify EDMDPMSolverMultistepScheduler to get similar result. I think EDMDPMSolverMultistepScheduler should be continuous diffusion models like EDM, the noise level values can be directly given to the model as their sigma inputs. I write a script, but it doen't work. Wondering how to fix it.

` import numpy as np

from diffusers import DiffusionPipeline, EDMDPMSolverMultistepScheduler as DefaultDPMSolver

class EDMDPMSolverMultistepScheduler(DefaultDPMSolver): def set_timesteps(self, num_inference_steps=None, device=None):

    # self.num_inference_steps = num_inference_steps

    # ramp = np.linspace(0, 1, self.num_inference_steps)
    # sigmas = self._compute_sigmas(ramp)
    sigmas = np.array([700.00, 54.5, 15.886, 7.977, 4.248, 1.789, 0.981, 0.403, 0.173, 0.034, 0.002])
    # self.num_inference_steps = len(sigmas)

    sigmas = torch.from_numpy(sigmas).to(dtype=torch.float32, device=device)

    self.timesteps = self.precondition_noise(sigmas)

    # self.sigmas = torch.cat([sigmas, torch.tensor([sigma_last], dtype=torch.float32, device=device)])
    print("sigmas=", self.sigmas)

    self.model_outputs = [
        None,
    ] * self.config.solver_order
    self.lower_order_nums = 0

    # add an index counter for schedulers that allow duplicated timesteps
    self._step_index = None
    self._begin_index = None
    self.sigmas = self.sigmas.to("cpu")  # to avoid too much CPU/GPU communication

`

christopher-beckham commented 5 months ago

Having a set_timesteps for all relevant schedulers would also make it much easier to implement things like this: https://github.com/huggingface/diffusers/issues/7651

yiyixuxu commented 5 months ago

@DN6

I'm a little bit confused, it says each scheduler would have its own optimized schedule but this is what they provide. Can these timesteps be used for all schedulers for these models?

cc @asomoza here maybe you have better ideas since it is popular in community

Model Schedule (noise levels) Schedule (timestep indices)
Stable Diffusion 1.5 [14.615, 6.475, 3.861, 2.697, 1.886, 1.396, 0.963, 0.652, 0.399, 0.152, 0.029] [999, 850, 736, 645, 545, 455, 343, 233, 124, 24, 0]
SDXL [14.615, 6.315, 3.771, 2.181, 1.342, 0.862, 0.555, 0.380, 0.234, 0.113, 0.029] [999, 845, 730, 587, 443, 310, 193, 116, 53, 13, 0]
DeepFloyd-IF / Stage-1 [160.41, 8.081, 3.315, 1.885, 1.207, 0.785, 0.553, 0.293, 0.186, 0.030, 0.006] [995, 920, 811, 686, 555, 418, 315, 174, 109, 12, 0]
Stable Video Diffusion [700.00, 54.5, 15.886, 7.977, 4.248, 1.789, 0.981, 0.403, 0.173, 0.034, 0.002] NA

To support this more natively, I think we can extend the timesteps argument to set_timesteps method to more schedulers; also extend it to the SVD pipeline they mentioned

I would first figure out what schedulers/pipelines can these optimized steps be applied to and only support these selected schedulers for now.

asomoza commented 5 months ago

Initially I thought it only worked for DPM schedulers but I've been testing it in comfyui and they enabled it for all of them, so far it works with all of them but I think that in all the SDE variants it's a lot worse (not usable).

Also in SDXL it's less noticeable but still get the performance gain.

prompt = "anthropomorphic capybara wearing a suit and working with a computer" ays = 10 steps normal = 25 steps

ays normal
Euler comfyui_ays_00001_ comfyui_normal_00001_
Heun comfyui_ays_00002_ comfyui_normal_00002_
dpm_2 comfyui_ays_00001_ comfyui_normal_00001_
dpmpp_2m comfyui_ays_00003_ comfyui_normal_00003_
dpmpp_2m_sde_gpu comfyui_ays_00001_ comfyui_normal_00001_
haofanwang commented 5 months ago

I'm curious about whether it works on fewer steps (2 steps, 4 steps, like distillation methods)? How can we derive the optimized timesteps from scratch?

DN6 commented 5 months ago

I'm a little bit confused, it says each scheduler would have its own optimized schedule but this is what they provide. Can these timesteps be used for all schedulers for these models?

Hmm yeah that is a bit confusing. I interpreted it as a unique schedule exists for different solvers. From the contributions section of the paper, I guess they mean the optimized schedule is applicable to multiple solvers for a given model type

(iv) We provide the optimized schedules for several commonly used models in the appendix to allow for easy plug-and-play use by the research community

yiyixuxu commented 5 months ago

@haofanwang they described their method in the paper - not sure how easy it is to reproduce & how much compute it would cost

DeFek1 commented 5 months ago

Is there away to inference more steps than 10?

wonkyoc commented 5 months ago

@DeFek1 you can use a linear interpolation in between those timestep indices

DeFek1 commented 5 months ago

@wonkyoc how? Is there any code i can try?

wonkyoc commented 5 months ago

The original repo includes the code or you could use numpy or other linear algebra library.

https://research.nvidia.com/labs/toronto-ai/AlignYourSteps/howto.html

github-actions[bot] commented 3 weeks ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.