pytorch / contrib

Implementations of ideas from recent papers
391 stars 42 forks source link

Loss become nan with Adam and AdamW as base optimizers #37

Open FunnyJingl opened 4 years ago

FunnyJingl commented 4 years ago

Loss becomes nan after training for ~20 steps - loss value stabily decreases and becomes nan with Adam or AdamW optimizers. In case of simple SGD usage it works well.