intuitive-robots / beso

[RSS 2023] Official code for "Goal Conditioned Imitation Learning using Score-based Diffusion Policies"
https://intuitive-robots.github.io/beso-website/
MIT License
60 stars 6 forks source link

Questions about the implementation of classifier-free guidance #1

Closed lihenglin closed 1 year ago

lihenglin commented 1 year ago

Hi,

Thanks for the nicely organized codebase! I have a question about your implementation of classifier-free guidance. In line 366 of score_gpts.py, it seems that when learning the unconditional policy, the goal is only masked out partly because the Bernoulli distribution applies to (bs, t, d) instead of (bs,). However, during inference, when calculating the unconditional probability, the goal would be a completely zero tensor according to line 302 of the same file. I'm wondering if this is the actual implementation you use for the paper results, and if so, what would be the intuition that masking partly during training can still make the diffusion model learn the unconditional policy.

mbreuss commented 1 year ago

Hi,

happy to hear that you like the code!

So I experimented with both variants before and in the end I got the best results with the partial masking instead of full zero tokens, so I continued to use that variant. I also used it for the experiments in the paper. However, I experimented with the full zero-masking early in the project, where there were some other issues in the policy. So maybe its worth to try out full-zero masking too. There is some follow-up work, that implements CFG by not giving the transformer policy a goal-token at all instead of masking it during training. They also show good results on their experiments: https://arxiv.org/abs/2310.07896 I think this works well for the state-based setting, where a lot of the state vector does not give you important information about the desired goal-state anyway.

Hope this helps!

lihenglin commented 1 year ago

Thanks for the reply and the pointer!