I use a LR scheduler to configure a warmup (Lr linearly increasing from very small value to it's real (=from args) value for 500 iter).
Will this confuse adabelief or it's okay ?
AdaBelief has a support for rectify=True, same warmup as RAdam. If you have your own lr schedule, you can set rectify=False and apply your own schedule
I use a LR scheduler to configure a warmup (Lr linearly increasing from very small value to it's real (=from args) value for 500 iter). Will this confuse adabelief or it's okay ?