Implementation of the training loss.

mingyuan-zhang / MotionDiffuse

MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model

https://mingyuan-zhang.github.io/projects/MotionDiffuse.html

Other

835 stars 74 forks source link

Implementation of the training loss. #9

Closed LinghaoChan closed 1 year ago

LinghaoChan commented 1 year ago

When I read the code in GaussianDiffusion, I think there is some difference between the paper and the code.

Here, the model_output is the predicted noise $\epsilon_{\theta}(xt, t, text)$ and the target is the $\tilde{\mu}{t}(x_t, x_0)$. Is it right?

If so, it ( $\epsilon_{\theta}(xt, t, text)-\tilde{\mu}{t}(x_t, x_0)$ ) is not as same as the statement in Equation 4.

mingyuan-zhang commented 1 year ago

Hi, our training target is the added noise. https://github.com/mingyuan-zhang/MotionDiffuse/blob/d47744809253c9f0164a5a88eac265051e404715/text2motion/models/gaussian_diffusion.py#L1040

LinghaoChan commented 1 year ago

sry, I mistook the loss type for ModelMeanType.PREVIOUS_X.

BTW, I have another question. The operations of loading data to GPU (.to(device)) are implemented in def _extract_into_tensor(arr, timesteps, broadcast_shape). Will this implementation reduce the utilization of GPU? Why not do this in __init__() function?

https://github.com/mingyuan-zhang/MotionDiffuse/blob/d47744809253c9f0164a5a88eac265051e404715/text2motion/models/gaussian_diffusion.py#L1131

LinghaoChan commented 1 year ago

Besides, is equation 5 a typo? Is it $\frac{1}{\sqrt{\alpha_t}}(x_t - \cdots)$, not $\frac{1}{\sqrt{x_t}}(x_t - \cdots)$?

mingyuan-zhang commented 1 year ago

sry, I mistook the loss type for ModelMeanType.PREVIOUS_X.

BTW, I have another question. The operations of loading data to GPU (.to(device)) are implemented in def _extract_into_tensor(arr, timesteps, broadcast_shape). Will this implementation reduce the utilization of GPU? Why not do this in __init__() function?

https://github.com/mingyuan-zhang/MotionDiffuse/blob/d47744809253c9f0164a5a88eac265051e404715/text2motion/models/gaussian_diffusion.py#L1131

HI, this code is borrowed from https://github.com/openai/guided-diffusion/blob/main/guided_diffusion/gaussian_diffusion.py . I'm not sure about some detailed implementation. In my opinion, the batchsize is not fixed. That's why they don't create the tensors in __init__

mingyuan-zhang commented 1 year ago

Besides, is equation 5 a typo? Is it 1αt(xt−⋯), not 1xt(xt−⋯)?

Yes. It's a typo here. We'll revise it in a new version. Thanks for your pointing out!

LinghaoChan commented 1 year ago

Good discussion. Thanks!