Why the total loss increase suddenly and rapidly when training the base crf ?

z814081807 / DeepNER

天池中药说明书实体识别挑战冠军方案；中文命名实体识别；NER; BERT-CRF & BERT-SPAN & BERT-MRC；Pytorch

922 stars 229 forks source link

train crf bert lr 2e-5 other lr 2e-3 epoch 10

The total loss decrease in the first 5 epoch. The loss is about 20. But suddenly the loss increase rapidly to about 10000.

Use 'get_linear_schedule_with_warmup' the lr should be smaller and smaller. But this phenomenon seems to disappear when I change the schedule. I don't understand why the loss increases.

Hello, I think the fundamental reason of this phenomenon is that the other learning rate 2e-3 is too high, you may use the same schedule with 2e-5. As for why a smaller lr what you said will appear this phenomenon, I think the local gradient is also increase suddenly.

z814081807 / DeepNER

Why the total loss increase suddenly and rapidly when training the base crf ? #6