[Question] Why a_{t-1}, a_{t-2}, ... also contribute to diffusion loss?

Dear Cheng @cheng-chi,

Thank you for your elegant and inspiring codes! I have a little question about the loss computation of noise prediction.

I think that actions before $at$ (i.e., $a{t-1}, a_{t-2},...$) in a sample of prediction horizon also contribute to the loss because they are not masked out. Am I right about this?

If noise of $a{t-1}, a{t-2},...$ contributing to loss is a speicial design, I wonder the reason of this. Is it for actions consistency or convenience? Because I think that just taking actions from $t$ and performing diffusion on $(at, a{t+1},...)$ are also intuitive. Why when we are at time $t$ we still predict the action in the past (I know that they are not returned by predict_action)?

Thank you for your time!

Regards, Dongjie

real-stanford / diffusion_policy

[Question] Why a_{t-1}, a_{t-2}, ... also contribute to diffusion loss? #44