juntang-zhuang / Adabelief-Optimizer

Repository for NeurIPS 2020 Spotlight "AdaBelief Optimizer: Adapting stepsizes by the belief in observed gradients"
BSD 2-Clause "Simplified" License
1.05k stars 109 forks source link

weight_decouple in adabelief tf #60

Closed YannPourcenoux closed 2 years ago

YannPourcenoux commented 2 years ago

Hi, I am a bit confused, it says that the weight-decouple is supported but not an option. Does it mean it is using it by default? If not how can I turn it on?

juntang-zhuang commented 2 years ago

Hi, it's turned on by default (same as AdamW) here and could not be turned off. In general, decoupled weight decay is more stable than coupled decay, so I hard code it in order to follow conventions in tensorflow-addons