locuslab / fast_adversarial

[ICLR 2020] A repository for extremely fast adversarial training using FGSM
434 stars 92 forks source link

facing "nan" values during training the model #18

Closed baogiadoan closed 2 years ago

baogiadoan commented 3 years ago

hi, during the training with my custom objective loss, I realized that sometimes the model went wrong and produce "nan" and become invalid; which I didn't face before with other training methods, is that because of the learning rate of the cyclic learning rate being too large and causing the loss to diverge as mentioned in the paper: For each method, we individually tune λ to be as large as possible without causing the training loss to diverge? or is it a bug?

I ran the original again with epochs=30 and also faced the same issue: image

leslierice1 commented 2 years ago

You could try training with a smaller learning rate or clipping the (unscaled) gradients. Do you notice the same behavior with and without training with mixed precision?