In brief, BDIA is a time-reversible ODE solver that can be applied to improve the performance of both diffusion sampling and diffusion inversion. Suppose we would like to estimate the next diffusion state $\boldsymbol{z}_{i-1}
$ by solving a probability ordinary differential equation (ODE)
d\boldsymbol{z} = \boldsymbol{d}(\boldsymbol{z},t)dt
based on the recent information $(\boldsymbol{z}_{i},t_i)
$ and $(\boldsymbol{z}_{i+1},t_{i+1})
$. The basic idea of BDIA is to compute $\boldsymbol{z}_{i-1}
$ by performing both the forward integration approximation $\Delta(t_i\rightarrow t_{i-1}|\boldsymbol{z}_i)
$ $\left(\approx \int_{t_i}^{t_{i-1}}\boldsymbol{d}(\boldsymbol{z},t)dt\right)
$ and the backbackward integration approximation $\Delta(t_i\rightarrow t_{i+1}|\boldsymbol{z}_i)
$ $\left(\approx -\int_{t_{i+1}}^{t_{i}}\boldsymbol{d}(\boldsymbol{z},t)dt\right)
$ conditioned on $\boldsymbol{z}i$. With the above two integration approximations, $`\boldsymbol{z}{i-1}`$ can be expressed as
\boldsymbol{z}_{i-1} = \boldsymbol{z}_{i+1} \underbrace{- (1-\textcolor{blue}{\gamma}) (\boldsymbol{z}_{i+1}-\boldsymbol{z}_{i}) - \textcolor{blue}{\gamma}\Delta(t_i\rightarrow t_{i+1}|\boldsymbol{z}_i)}_{\approx \int_{t_{i+1}}^{t_{i}}\boldsymbol{d}(\boldsymbol{z},t)dt } + \underbrace{\Delta(t_i\rightarrow t_{i-1}|\boldsymbol{z}_i)}_{ \approx \int_{t_i}^{t_{i-1}}\boldsymbol{d}(\boldsymbol{z},t)dt }
where $\gamma=[0,1]$ averages the backward and forward integration approximations for the time-slot $[t_{i+1},t_i]
$. One nice property of the above update expression is that it is invertiable. That is, $\boldsymbol{z}{i+1}$ can be represented as an expression of $`\boldsymbol{z}{i-1}$ and $
\boldsymbol{z}_{i}`$, which enables exact diffusion inversion.
The BDIA technique can be applied directly to DDIM. In this case, the forward integration approximation $\Delta(t_i\rightarrow t_{i-1}|\boldsymbol{z}_i)
$ becomes the DDIM updates, which is given by
\Delta(t_i\rightarrow t_{i-1}|\boldsymbol{z}_i) = \alpha_{i-1} \left(\frac{\boldsymbol{z}_{i} \hspace{-0.3mm}-\hspace{-0.3mm} \sigma_{i}\hat{\boldsymbol{\epsilon}}_{\boldsymbol{\theta}}(\boldsymbol{z}_{i}, i) }{\alpha_{i}}\right)+\hspace{0.5mm}\sigma_{i-1}\hat{\boldsymbol{\epsilon}}_{\boldsymbol{\theta}}(\boldsymbol{z}_{i}, i) - \boldsymbol{z}_i
Correspondingly, the backward integration approximation $\Delta(t_i\rightarrow t_{i}|\boldsymbol{z}_{i+1})
$ becomes the DDIM updates, which is given by
\Delta(t_i\rightarrow t_{i+1}|\boldsymbol{z}_{i}) = \alpha_{i+1} \left(\frac{\boldsymbol{z}_{i} \hspace{-0.3mm}-\hspace{-0.3mm} \sigma_{i}\hat{\boldsymbol{\epsilon}}_{\boldsymbol{\theta}}(\boldsymbol{z}_{i}, i) }{\alpha_{i}}\right)+\hspace{0.5mm}\sigma_{i+1}\hat{\boldsymbol{\epsilon}}_{\boldsymbol{\theta}}(\boldsymbol{z}_{i}, i) - \boldsymbol{z}_i
As a result, the final update expression of BDIA-DDIM is
\boldsymbol{z}_{i-1} = \gamma \boldsymbol{z}_{i+1}-\gamma\Big[\alpha_{i+1} \Big(\frac{\boldsymbol{z}_{i} \hspace{-0.3mm}-\hspace{-0.3mm} \sigma_{i}\hat{\boldsymbol{\epsilon}}_{\boldsymbol{\theta}}(\boldsymbol{z}_{i}, i) }{\alpha_{i}}\Big)+\hspace{0.5mm}\sigma_{i+1}\hat{\boldsymbol{\epsilon}}_{\boldsymbol{\theta}}(\boldsymbol{z}_{i}, i) \Big] + \Big[\alpha_{i-1} \left(\frac{\boldsymbol{z}_{i} \hspace{-0.3mm}-\hspace{-0.3mm} \sigma_{i}\hat{\boldsymbol{\epsilon}}_{\boldsymbol{\theta}}(\boldsymbol{z}_{i}, i) }{\alpha_{i}}\right)+\hspace{0.5mm}\sigma_{i-1}\hat{\boldsymbol{\epsilon}}_{\boldsymbol{\theta}}(\boldsymbol{z}_{i}, i) \Big]
One can also apply the BDIA technique to the EDM and DPM-Solver++ sampling procedures. Experiments on BDIA-EDM show that it outperforms EDM consistently over four pre-trained models in terms of FID scores. The details can be found out in the paper.
Recently, the BDIA technique has also been sucessfully applied to design reversiable transformers.
Source code for round-trip image-editing has been added on Januray 8th, 2024. The code depends heavily on the original source code for EDICT.
Source code for text-to-image over StableDiffusion V2 has been added in October 2023.
Note: For 10 timesteps, it is preferable to set $\gamma=0.5$ while for 40 timesteps, it is recommanded to set $\gamma=1.0$.
If you find our work useful in your research, please cite:
@MISC{guoqiang2024bdia,
title={Exact Diffusion Inversion via Bi-directional Integration Approximation},
author={G. Zhang and J. P. Lewis and W. B. Kleijn},
howpublished={ECCV},
year={2024}
}