what does $z_t$ refer to in formulas (1) and (3)

magic-research / magic-animate

[CVPR 2024] MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model

BSD 3-Clause "New" or "Revised" License

10.42k stars 1.07k forks source link

The paper primarily draws inspiration from the ControlNet framework to incorporate reference image information and motion pose sequence information into the training and inference processes of a diffusion model. However, there is a question regarding the meaning of the variable $z_t$ in formulas (1) and (3) in the paper. During the training process, the only observable elements from Figure 2 are the random noises $z_0^{1:K}={z_0^1, z_0^2, \cdots z_0^K}$.

At $t=0$, in formulas (1) and (3), $z_t$ is one of the elements in $z_0^{1:K}={z_0^1, z_0^2, \cdots z_0^K}$, which is also not explained coherently.

magic-research / magic-animate

what does $z_t$ refer to in formulas (1) and (3) #145