Closed Harryqu123 closed 9 months ago
Hi @Harryqu123, I'm sorry, I didn't get back to you sooner.
Regarding your question about the dense guidance part, doing it the way we mentioned in 4.2 actually has two benefits: making the signal dense and computing the guidance signal with respect to x_t (because the mean vector we are currently denoising is x_t) instead of x_0. If your guidance is already dense and is already computed w.r.t. x_t (noisy image/motion) then you can skip this part and use only 4.1.
For the emphasis projection, this happens at the representation level before training. What it means is we will have to re-train the model with this new representation.
I hope this helps. Best,
Many thanks for your reply. They are very helpful. I have closed the issue. Hope for a nice week ahead.
Hi authors, I would like to first appreciate on your interesting work. I am writing this issue to clarify whether my understanding on your emphasis projection contribution is correct or not, and I really appreciate if you can spend your valuable time to answer my questions. Firstly, since the dense guidance part in section 4.2 of your paper is for densing the signal, does this mean that, as long as I already have a dense signal (e.g., full trajectory on every frame already), I can leverage section 4.1 only? Secondly, if I understand correctly, it seems that section 4.1 of you paper happens only during the sampling (inference) period instead of requring a re-training of an existing motion diffusion model. Can I know if my this understanding correct? Really thanks for your help in advance.