juntang-zhuang / Adabelief-Optimizer

Repository for NeurIPS 2020 Spotlight "AdaBelief Optimizer: Adapting stepsizes by the belief in observed gradients"
BSD 2-Clause "Simplified" License
1.05k stars 109 forks source link

loss become nan when beta1=0 #67

Open yojeep opened 6 months ago

yojeep commented 6 months ago

Hello ,when I use Adabelief with beta1 = 0 , beta2 = 0.999(SAGAN、BIGGAN、WGAN-GP) ,the loss becomes nan , while Adam works well.I am wondering whether if the hyper parameter needs to be specifically changed when beta1 = 0? In many GANs, It is required that beta1 = 0 to stablize them.