Thank you for your elegant and inspiring codes! I have a little question about the loss computation of noise prediction.
I think that actions before $at$ (i.e., $a{t-1}, a_{t-2},...$) in a sample of prediction horizon also contribute to the loss because they are not masked out. Am I right about this?
If noise of $a{t-1}, a{t-2},...$ contributing to loss is a speicial design, I wonder the reason of this. Is it for actions consistency or convenience? Because I think that just taking actions from $t$ and performing diffusion on $(at, a{t+1},...)$ are also intuitive. Why when we are at time $t$ we still predict the action in the past (I know that they are not returned by predict_action)?
Dear Cheng @cheng-chi,
Thank you for your elegant and inspiring codes! I have a little question about the loss computation of noise prediction.
I think that actions before $at$ (i.e., $a{t-1}, a_{t-2},...$) in a sample of prediction horizon also contribute to the loss because they are not masked out. Am I right about this?
If noise of $a{t-1}, a{t-2},...$ contributing to loss is a speicial design, I wonder the reason of this. Is it for actions consistency or convenience? Because I think that just taking actions from $t$ and performing diffusion on $(at, a{t+1},...)$ are also intuitive. Why when we are at time $t$ we still predict the action in the past (I know that they are not returned by
predict_action
)?Thank you for your time!
Regards, Dongjie