Open robertmash2 opened 3 months ago
Hi! I'm not the author, but I can offer my understanding. I believe your first guess is correct: We model p(A|O,g), where g is the goal. In the case of a goal-reaching task, g would be the coordinates for the goal position. Concretely, in the case of their U-Net, this would mean appending the goal-tensor to the observations and passing the whole tensor to the FiLM conditioning. Please let me know if I'm wrong here :)
Authors,
First, I greatly appreciate your insightful paper and well-written code!
My questions is in regards to goal-conditioning. In section 3.1 of your paper (Network Architecture Options) when discussing the CNN-based Diffusion Policy, you mention:
"However, goal conditioning is still possible with the same FiLM conditioning method used for observations."
I wonder if you could comment along these lines a bit? As the trajectory sampled is the conditional probability p(A|O), would one simply encode the goal observation and concat it to the initializing observations Ot? Or do you mean something else, like a restructuring to sample the joint trajectory p(A, O)? I'm sorry, I think I'm missing something.
Thanks very much for your time,
Robert Mash