real-stanford / diffusion_policy

[RSS 2023] Diffusion Policy Visuomotor Policy Learning via Action Diffusion
https://diffusion-policy.cs.columbia.edu/
MIT License
1.09k stars 203 forks source link

Goal conditioning via FiLM? #58

Open robertmash2 opened 3 months ago

robertmash2 commented 3 months ago

Authors,

First, I greatly appreciate your insightful paper and well-written code!

My questions is in regards to goal-conditioning. In section 3.1 of your paper (Network Architecture Options) when discussing the CNN-based Diffusion Policy, you mention:

"However, goal conditioning is still possible with the same FiLM conditioning method used for observations."

I wonder if you could comment along these lines a bit? As the trajectory sampled is the conditional probability p(A|O), would one simply encode the goal observation and concat it to the initializing observations Ot? Or do you mean something else, like a restructuring to sample the joint trajectory p(A, O)? I'm sorry, I think I'm missing something.

Thanks very much for your time,

Robert Mash

sigmundhh commented 2 months ago

Hi! I'm not the author, but I can offer my understanding. I believe your first guess is correct: We model p(A|O,g), where g is the goal. In the case of a goal-reaching task, g would be the coordinates for the goal position. Concretely, in the case of their U-Net, this would mean appending the goal-tensor to the observations and passing the whole tensor to the FiLM conditioning. Please let me know if I'm wrong here :)