Closed Fei-Wang closed 3 years ago
Train loss should always decrease if you set the learning rate small enough. Have you tried smaller learning rate? Also, can you check if you can overfit a small portion of the train set?
Thank you for your reply, I will figure it out.
ner task.
I find it is because I use dropout between Bert and linear layer. when I set model.eval, it is act correct, but if I set model.train, the metric is low. But as I know, dropout is used for avoid overfit, it should't act so large different between model.eval and model.train.