hi, during the training with my custom objective loss, I realized that sometimes the model went wrong and produce "nan" and become invalid; which I didn't face before with other training methods, is that because of the learning rate of the cyclic learning rate being too large and causing the loss to diverge as mentioned in the paper: For each method, we individually tune λ to be as large as possible without causing the training loss to diverge? or is it a bug?
I ran the original again with epochs=30 and also faced the same issue:
You could try training with a smaller learning rate or clipping the (unscaled) gradients. Do you notice the same behavior with and without training with mixed precision?
hi, during the training with my custom objective loss, I realized that sometimes the model went wrong and produce "nan" and become invalid; which I didn't face before with other training methods, is that because of the learning rate of the cyclic learning rate being too large and causing the loss to diverge as mentioned in the paper:
For each method, we individually tune λ to be as large as possible without causing the training loss to diverge
? or is it a bug?I ran the original again with
epochs=30
and also faced the same issue: