sgugger / Adam-experiments

Experiments with Adam/AdamW/amsgrad
194 stars 36 forks source link

How to choose wd? #6

Open MohitLamba94 opened 3 years ago

MohitLamba94 commented 3 years ago

Thankyou for this wonderful benchmarking.

In several experiments wd=1.2e-6. Can you please give some guidelines or rule of thumb in choosing the hyperparameter for weight decay?

twmht commented 2 years ago

@MohitLamba94

Any update?

MohitLamba94 commented 2 years ago

@MohitLamba94

Any update?

Sorry. I did not look into into any further.