Closed LinghaoChan closed 1 year ago
Hi, our training target is the added noise. https://github.com/mingyuan-zhang/MotionDiffuse/blob/d47744809253c9f0164a5a88eac265051e404715/text2motion/models/gaussian_diffusion.py#L1040
sry, I mistook the loss type for ModelMeanType.PREVIOUS_X
.
BTW, I have another question. The operations of loading data to GPU (.to(device)
) are implemented in def _extract_into_tensor(arr, timesteps, broadcast_shape)
. Will this implementation reduce the utilization of GPU? Why not do this in __init__()
function?
Besides, is equation 5 a typo? Is it $\frac{1}{\sqrt{\alpha_t}}(x_t - \cdots)$, not $\frac{1}{\sqrt{x_t}}(x_t - \cdots)$?
sry, I mistook the loss type for
ModelMeanType.PREVIOUS_X
.BTW, I have another question. The operations of loading data to GPU (
.to(device)
) are implemented indef _extract_into_tensor(arr, timesteps, broadcast_shape)
. Will this implementation reduce the utilization of GPU? Why not do this in__init__()
function?
HI, this code is borrowed from https://github.com/openai/guided-diffusion/blob/main/guided_diffusion/gaussian_diffusion.py . I'm not sure about some detailed implementation. In my opinion, the batchsize is not fixed. That's why they don't create the tensors in __init__
Besides, is equation 5 a typo? Is it 1αt(xt−⋯), not 1xt(xt−⋯)?
Yes. It's a typo here. We'll revise it in a new version. Thanks for your pointing out!
Good discussion. Thanks!
When I read the code in GaussianDiffusion, I think there is some difference between the paper and the code.
Here, the
model_output
is the predicted noise $\epsilon_{\theta}(xt, t, text)$ and thetarget
is the $\tilde{\mu}{t}(x_t, x_0)$. Is it right?If so, it ( $\epsilon_{\theta}(xt, t, text)-\tilde{\mu}{t}(x_t, x_0)$ ) is not as same as the statement in Equation 4.