Added learning rate bias correction and, m_t = m.assign(tf.maximum(beta2_t * m + eps, tf.abs(grad)))
this line is wrong because if (beta2_t * m+eps)<0 and grad==0 then g_t = v_t / m_t will be undefined. correct line is,
m_t = m.assign(tf.maximum(beta2_t * m,tf.abs(grad)+eps))
Added learning rate bias correction and,
m_t = m.assign(tf.maximum(beta2_t * m + eps, tf.abs(grad)))
this line is wrong because if
(beta2_t * m+eps)<0
andgrad==0
theng_t = v_t / m_t
will be undefined. correct line is,m_t = m.assign(tf.maximum(beta2_t * m,tf.abs(grad)+eps))