Possibly low F1 when finetuning BERT base

dpfried commented 4 years ago

Hi Mandar,

When I finetune BERT base, I get an OntoNotes dev F1 of 73.69. I was wondering if this is within the variance that you saw for BERT base, or could there be some problem with my setup?

I'm using the requirements versions from requirements.txt (except with the MarkupSafe version changed to 1.1.1, https://github.com/mandarjoshi90/coref/pull/40, and psycopg2 changed to psycopg2-binary), and am training on a V100 32GB, with these commands:

python train.py train_bert_base
python evaluate.py train_bert_base

When evaluating your finetuned BERT base model on dev (python evaluate.py bert_base), I get an F1 of 74.05. This is closer to the 74.3 dev F1 number from Table 4, but should it match exactly? I'm wondering if there could be some difference in my setup which affects eval a bit but gets magnified during training.

Thanks, Daniel

mandarjoshi90 commented 4 years ago

Hi Daniel. This is likely within variance although it does seem that you're losing a tiny bit due to some (possibly setup related?) issue as well. I wouldn't worry about it too much given that around 0.5 variance is fairly common with these large models.

dpfried commented 4 years ago

Thanks, and thanks for the quick response!

mandarjoshi90 / coref

Possibly low F1 when finetuning BERT base #44