Closed junha-l closed 3 months ago
Hi,
I'm currently using a custom
adam_onecycle
optimizer implemented in this codebase, withtrue_wd=True
andbn_wd=True
, along with its associated scheduler. However, I've noticed that PyTorch providesAdamW
optimizer andOneCycleLR
scheduler, which seem to have similar functionality.I ran some experiments switching between custom
adam_onecycle
and PyTorch's implementation, and I noticed significant performance differences.Could you help me in understanding the main differences between:
- custom
adam_onecycle
optimizer (withtrue_wd=True, bn_wd=True
) + its scheduler- PyTorch's
AdamW
+OneCycleLR
schedulerSpecifically, I'm interested in understanding what might be causing the performance differences I observed.
Thanks!
Does adamw perform better than adam? How much better?
This issue is stale because it has been open for 30 days with no activity.
This issue was closed because it has been inactive for 14 days since being marked as stale.
Hi,
I'm currently using a custom
adam_onecycle
optimizer implemented in this codebase, withtrue_wd=True
andbn_wd=True
, along with its associated scheduler. However, I've noticed that PyTorch providesAdamW
optimizer andOneCycleLR
scheduler, which seem to have similar functionality.I ran some experiments switching between custom
adam_onecycle
and PyTorch's implementation, and I noticed significant performance differences.Could you help me in understanding the main differences between:
adam_onecycle
optimizer (withtrue_wd=True, bn_wd=True
) + its schedulerAdamW
+OneCycleLR
schedulerSpecifically, I'm interested in understanding what might be causing the performance differences I observed.
Thanks!