Adept uses global gradient norm clipping of 0.5 by default which could prevent learning (just slowing it down) if the gradients are high and the clipping occurs at every training step. At minimum we should log this to tensorboard so the user can decide/view for themselves.
Adept uses global gradient norm clipping of 0.5 by default which could prevent learning (just slowing it down) if the gradients are high and the clipping occurs at every training step. At minimum we should log this to tensorboard so the user can decide/view for themselves.