juntang-zhuang / Adabelief-Optimizer

Repository for NeurIPS 2020 Spotlight "AdaBelief Optimizer: Adapting stepsizes by the belief in observed gradients"
BSD 2-Clause "Simplified" License
1.04k stars 108 forks source link

Documentation (at least for TF) and weight_decouple is not an option #51

Open grofte opened 3 years ago

grofte commented 3 years ago

Hiya,

In the ReadME you say that Rectify is implemented as an option but the default is True. I would update the ReadME to reflect that.

You also make it sound like weight_decouple is an available option in the TF version. But it isn't:

| AdaBeliefOptimizer(learning_rate=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-14, weight_decay=0.0, rectify=True, amsgrad=False, sma_threshold=5.0, total_steps=0, warmup_proportion=0.1, min_lr=0.0, name='AdaBeliefOptimizer', print_change_log=True, **kwargs)

I just get an error message when I try to set weight_decouple=True.

Great work otherwise!

juntang-zhuang commented 3 years ago

Thanks for the feedback! I'll update readme file to clarify the configurations for tf version.

grofte commented 3 years ago

It's also in the log for version 2.0:

Please check your arguments if you have upgraded adabelief-tf from version 0.0.1.
Modifications to default arguments:
                           eps  weight_decouple    rectify
-----------------------  -----  -----------------  -------------
adabelief-tf=0.0.1       1e-08  Not supported      Not supported
>=0.1.0 (Current 0.2.0)  1e-14  supported          default: True
SGD better than Adam (e.g. CNN for Image Classification)    Adam better than SGD (e.g. Transformer, GAN)
----------------------------------------------------------  ----------------------------------------------
Recommended epsilon = 1e-7                                  Recommended epsilon = 1e-14
For a complete table of recommended hyperparameters, see
https://github.com/juntang-zhuang/Adabelief-Optimizer
You can disable the log message by setting "print_change_log = False", though it is recommended to keep as a reminder.


But was I supposed to use version 2.1? The readme said 2.0 was the current version so that's what I went with.