error increase, f1 decrease

Fei-Wang commented 3 years ago

ner task.

with Bert + softmax+cross-entropy, train loss decrease, f1 increase, and valid data act as train data.
with Bert + crf, train loss first decrease and then increase, f1 first increase then decrease. but valid act as normal.

I find it is because I use dropout between Bert and linear layer. when I set model.eval, it is act correct, but if I set model.train, the metric is low. But as I know, dropout is used for avoid overfit, it should't act so large different between model.eval and model.train.

kmkurn commented 3 years ago

Train loss should always decrease if you set the learning rate small enough. Have you tried smaller learning rate? Also, can you check if you can overfit a small portion of the train set?

Fei-Wang commented 3 years ago

Thank you for your reply, I will figure it out.

kmkurn / pytorch-crf

error increase, f1 decrease #76