Closed DtYXs closed 3 years ago
train crf bert lr 2e-5 other lr 2e-3 epoch 10
The total loss decrease in the first 5 epoch. The loss is about 20. But suddenly the loss increase rapidly to about 10000.
Use 'get_linear_schedule_with_warmup' the lr should be smaller and smaller. But this phenomenon seems to disappear when I change the schedule. I don't understand why the loss increases.
Hello, I think the fundamental reason of this phenomenon is that the other learning rate 2e-3 is too high, you may use the same schedule with 2e-5. As for why a smaller lr what you said will appear this phenomenon, I think the local gradient is also increase suddenly.
train crf bert lr 2e-5 other lr 2e-3 epoch 10
The total loss decrease in the first 5 epoch. The loss is about 20. But suddenly the loss increase rapidly to about 10000.
Use 'get_linear_schedule_with_warmup' the lr should be smaller and smaller. But this phenomenon seems to disappear when I change the schedule. I don't understand why the loss increases.