Strange f1 scores for ner

luffycodes commented 2 years ago

Hi, I just created a new conda environment on my new machine.

Finetuning NER using instruction from link me 0 f1 score.

span_accuracy: 1.0000, batch_loss: 0.0000, loss: 0.0001 ||: 100%|##########| 2889/2889 [01:08<00:00, 42.46it/s] 2022-09-21 22:33:49,594 - INFO - allennlp.training.callbacks.console_logger - Training | Validation 2022-09-21 22:33:49,594 - INFO - allennlp.training.callbacks.console_logger - f1 | 0.000 | 0.000 2022-09-21 22:33:49,594 - INFO - allennlp.training.callbacks.console_logger - gpu_0_memory_MB | 0.000 | N/A 2022-09-21 22:33:49,594 - INFO - allennlp.training.callbacks.console_logger - loss | 0.014 | 0.000 2022-09-21 22:33:49,594 - INFO - allennlp.training.callbacks.console_logger - precision | 0.000 | 0.000 2022-09-21 22:33:49,594 - INFO - allennlp.training.callbacks.console_logger - recall | 0.000 | 0.000 2022-09-21 22:33:49,594 - INFO - allennlp.training.callbacks.console_logger - span_accuracy | 0.996 | 1.000

2 months back, it used to work, however, is there been a change in some package that is causing this?

ryokan0123 commented 2 years ago

The code concerning NER has not been changed for the last several months, so I do not have any clues about why this happens... I see that the model performs almost perfectly in terms of span accuracy but zero for precision/recall. I imagine that the data causing this? For example, there are no real NER tags in the data and the model predicts "not-entity" for all spans.

luffycodes commented 2 years ago

I am using the conll text dataset used in this https://colab.research.google.com/github/studio-ousia/luke/blob/master/notebooks/huggingface_conll_2003.ipynb#scrollTo=L3U75A-27yTj.

studio-ousia / luke

Strange f1 scores for ner #160