Closed leonmkim closed 1 month ago
Thanks so much for reporting this. Oops... this was my mistake. It was not intentional. Now that you ask, do you believe there's a strong reason to switch back to AdamW? (other than that it's how the original does it?)
Other than for reproducibility using the default hparams, I personally don't know. Someone else, perhaps the authors, may have the evidence for how sensitive diffusion policy is to the choice of optimizer/weight decay parameter. Happy to have the issue closed as I was more curious than anything.
Thanks for the response!
Thank you for this incredibly useful repo! I had a small question regarding the optimizer used for training diffusion policies as it seems like Adam is used in this implementation, but glancing at the DP author's codebase for both DP and UMI, it seems they use AdamW. As far as I know, pytorch's Adam and AdamW handle weight decay differently and so I was wondering if this was an intentional deviation from the original implementation.
Apologies if I misread anything and thanks again.