real-stanford / diffusion_policy

[RSS 2023] Diffusion Policy Visuomotor Policy Learning via Action Diffusion
https://diffusion-policy.cs.columbia.edu/
MIT License
1.1k stars 206 forks source link

Important components #12

Closed hbishop1 closed 1 year ago

hbishop1 commented 1 year ago

Hi, I am currently looking at implementing a diffusion model for policy learning and was very impressed by your work! I was wondering what components of your approach you found to be particularly important for good results? 3 things I specifically was curious about were:

cheng-chi commented 1 year ago

Hi @hbishop1:

  1. I empirically found EMA to accelerate training (eval performance increases faster) and improve performance (by <5%), but the policy should "work" even without it.
  2. I found the causal attention masking to be critical to get the transformer variant of diffusion policy to work. My suspicion is that when used without it, the model "cheats" by looking ahead into future end-effector poses, which is almost identical to the action of the current timestep.
  3. I think the model capacity needed depends on task complexity (more complex task requires larger CNN). Reducing the number of training diffusion steps also reduces CNN capacity requirement at the expense of reduced action quality. ~10M CNN should still work with less than 10% performance penalty on benchmarks we have tested.
hbishop1 commented 1 year ago

Great, thanks for the quick and detailed response, that will really help!