Open pomelyu opened 1 month ago
and thus
When $\sigma_t=1$, DDIM becomes DDPM
Forward process
Then we have reverse process
$\mathcal{w}_t$ and $\bar{\mathcal{w}}_t$ are the standard Wiener process(Brownian motion) and reverse Wiener process respectively. We can get denoising samples at time step t by solving the reverse SDE at that time. $\nabla_x \log q_t(\mathbf{x}_t)$ is score function and it's the only unknown term in the reverse SDE. As a result, we can use a neural network to formulate it.
As mentioned previously, $\nabla_x \log q_t(\mathbf{x}_t)$ can be formulated by a neural network
From Scored-based Generative Modeling through Stochastic Differential Equation, we have
Then we can get its solutions
different "order" means using the different order in Taylor expansion to approximate integral
The author says that the larger guidance scale will amplify the derivatives of the model and thus the converge range of ODE solvers.
dynamic thresholding methods: Photorealistic text-to-image diffusion models with deep language understanding
DPM-Solver++ has a great interpretation about DPM, SDE and ODE for both $x\theta$ and $\epsilon\theta$ model
then
and $x_{t-\delta}$
If we replace $\epsilon$ by $\epsilon_\theta$, we have DDIM.
Linear Multi-Step Method
The author calls using high order numerical method for precise $\epsilon$ as PNMS
Theoretically, we can derive $x_t$ in reverse process by solving the following equation
From DPM-Solver, we have approximate the solution by taylor expansion and have DDPM-Solver-1(~DDIM)
We can add an error correction term called UniC
Compare it with expanding the exponential integrator in (2). we have
It directly predicts $\tilde{x}_{t_i}$ from previous estimation
DDPM
1. propose the definition of forward equation. i.e.
and thus
2. design a neutral network that has the below property to approximate the reverse(denoising) process
then we have
3. If we further assume $\Sigma_\theta(x_t, t) = \sigma_t^2$, we can get
The formula indicates that we can design the network to predict "noise" instead of the "denoised sample". Finally we get the DDPM algorithm
Or, we can use formula in 2. to estimate $x0$ first and then derive $x{t-1}$ by $q(x_{t-1}|x_0, x_t)$
4. In DDPM, the denoising steps are sampled uniformly from 0 to T and by default have 1000 steps
5. Note that both forward and reverse process are Markov chain(the outputs of next time-step only depend on current state) and the final results are none-deterministic
6. The loss function is designed to mininize the probability lower bound(KL) and thus induce the MSELoss in reconstruction, like VAE
7. In DDPM, there is a scale function(depends on t) to scale the loss value in training for every time step