Open paulthemagno opened 4 years ago
I have fine-tuned a BERT-NER model and on the eval_result.txt I got these values: P=0.608764 R=0.588080 F=0.594982
In my understanding these results come from the dev dataset (valid). While on the test set I got
processed 40982 tokens with 4577 phrases; found: 4645 phrases; correct: 4158. accuracy: 98.22%; precision: 89.52%; recall: 90.85%; FB1: 90.18 LOC: precision: 92.54%; recall: 92.54%; FB1: 92.54 1394 MISC: precision: 81.21%; recall: 82.31%; FB1: 81.76 676 ORG: precision: 84.54%; recall: 88.56%; FB1: 86.51 1255 PER: precision: 95.30%; recall: 95.45%; FB1: 95.38 1320
I'd like to understand the mismatch respecting the conll standard evaluation script.
@paulthemagno Is the accuracy problem solved?
I have fine-tuned a BERT-NER model and on the eval_result.txt I got these values: P=0.608764 R=0.588080 F=0.594982
In my understanding these results come from the dev dataset (valid). While on the test set I got
I'd like to understand the mismatch respecting the conll standard evaluation script.