kmkurn / pytorch-crf

(Linear-chain) Conditional random field in PyTorch.
https://pytorch-crf.readthedocs.io
MIT License
935 stars 151 forks source link

error increase, f1 decrease #76

Closed Fei-Wang closed 3 years ago

Fei-Wang commented 3 years ago

ner task.

  1. with Bert + softmax+cross-entropy, train loss decrease, f1 increase, and valid data act as train data.
  2. with Bert + crf, train loss first decrease and then increase, f1 first increase then decrease. but valid act as normal.

I find it is because I use dropout between Bert and linear layer. when I set model.eval, it is act correct, but if I set model.train, the metric is low. But as I know, dropout is used for avoid overfit, it should't act so large different between model.eval and model.train.

kmkurn commented 3 years ago

Train loss should always decrease if you set the learning rate small enough. Have you tried smaller learning rate? Also, can you check if you can overfit a small portion of the train set?

Fei-Wang commented 3 years ago

Thank you for your reply, I will figure it out.