sgugger / Adam-experiments

Experiments with Adam/AdamW/amsgrad
194 stars 36 forks source link

AdamW vs Adam #7

Open pgsrv opened 3 years ago

pgsrv commented 3 years ago

Hi! According to the results in 'Appendix: Full results' (https://www.fast.ai/2018/07/02/adam-weight-decay/) is it correct that Adam is better for fine-tuning, while AdamW is better for training from scratch? Thanks!