Question: adam optimizer with weight decay for diffusion policy implementation

huggingface / lerobot

🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning

Apache License 2.0

6.87k stars 626 forks source link

Question: adam optimizer with weight decay for diffusion policy implementation #368

Closed leonmkim closed 1 month ago

leonmkim commented 2 months ago

Thank you for this incredibly useful repo! I had a small question regarding the optimizer used for training diffusion policies as it seems like Adam is used in this implementation, but glancing at the DP author's codebase for both DP and UMI, it seems they use AdamW. As far as I know, pytorch's Adam and AdamW handle weight decay differently and so I was wondering if this was an intentional deviation from the original implementation.

Apologies if I misread anything and thanks again.

alexander-soare commented 2 months ago

Thanks so much for reporting this. Oops... this was my mistake. It was not intentional. Now that you ask, do you believe there's a strong reason to switch back to AdamW? (other than that it's how the original does it?)

leonmkim commented 2 months ago

Other than for reproducibility using the default hparams, I personally don't know. Someone else, perhaps the authors, may have the evidence for how sensitive diffusion policy is to the choice of optimizer/weight decay parameter. Happy to have the issue closed as I was more curious than anything.

Thanks for the response!