Closed lihenglin closed 1 year ago
Hi,
happy to hear that you like the code!
So I experimented with both variants before and in the end I got the best results with the partial masking instead of full zero tokens, so I continued to use that variant. I also used it for the experiments in the paper. However, I experimented with the full zero-masking early in the project, where there were some other issues in the policy. So maybe its worth to try out full-zero masking too. There is some follow-up work, that implements CFG by not giving the transformer policy a goal-token at all instead of masking it during training. They also show good results on their experiments: https://arxiv.org/abs/2310.07896 I think this works well for the state-based setting, where a lot of the state vector does not give you important information about the desired goal-state anyway.
Hope this helps!
Thanks for the reply and the pointer!
Hi,
Thanks for the nicely organized codebase! I have a question about your implementation of classifier-free guidance. In line 366 of
score_gpts.py
, it seems that when learning the unconditional policy, the goal is only masked out partly because the Bernoulli distribution applies to (bs, t, d) instead of (bs,). However, during inference, when calculating the unconditional probability, the goal would be a completely zero tensor according to line 302 of the same file. I'm wondering if this is the actual implementation you use for the paper results, and if so, what would be the intuition that masking partly during training can still make the diffusion model learn the unconditional policy.