Closed Zhazhan closed 1 month ago
The deductions come from the denoising process in DDPM that uses $p(x_{t-1} | x_t , x0 )$ to estimate $p(x{t-1} | x_t )$. As such, the $\sigma_t$ only depends on the noise timesteps, and keeps same in the numerator and denominator in Equation (5). Therefore, the same constant terms in the log(p) of the numerator and denominator are omitted in Equation (7).
Thanks for @You-Cun 's response, here I additionally add some information. On a Gaussian distribution, the probability of a single point is defined as $0$. Hence, we should view $p(x{t-1}|x{t}, c)$ as the probability on a neighborhood around a specific $x{t-1}$. Since this neighborhood tends infinitely towards 0, we can approximately view the probability density on it as a constant value, that is, the $exp(\frac{-||x{t-1} - x{\theta}(x{t},c,t)||^{2}}{2\sigma_{t}^{2}})$. This trick was also used in Diffusion Models Beat GANs on Image Synthesis. Here we get the Eq. 6 and with @You-Cun 's response, we can view the variance $\sigma_{t}$ as a constant, therefore we can obtain Eq. 7.
Thanks to You-Cun and qpc1611094 for your responses. I seem to have misunderstood $p(x_{t−1} | x_t, c)$. It should be a PDF, not a CDF.
I would like to extend my gratitude to the authors of the paper and the maintainers of this project for your exceptional work.
I have been reading the paper titled "FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation" and came across a point of concern regarding the derivation of Equation (7), which addresses L{sude}. Specifically, while Equation (6) indicates that $p(x{t-1} | xt, c)$ follows a normal distribution, this does not necessarily imply that it is proportional to $e^{-\frac{|| x{t-1}-x_{\theta}(x_t, c, t) ||^2}{2\sigma_t^2}}$.
The $e^{-\frac{|| x{t-1}-x{\theta}(x_t, c, t) ||^2}{2\sigmat^2}}$ represents the probability density function, but this does not mean that $log[p(x{t-1} | xt, c)]$ is directly proportional to $|| x{t-1}-x_{\theta}(xt, c, t) ||^2$, which in turn seems to make the derivation of Equation (7) untenable. Instead, we only have $log \nabla p(x{t-1} | xt, c)$ proportional to $|| x{t-1}-x_{\theta}(x_t, c, t) ||^2$.
If there is any point where I may have misunderstood, I would appreciate any clarification.