hipe-eval / HIPE-scorer

A python module for evaluating NERC and NEL system performances as defined in the HIPE shared tasks (formerly CLEF-HIPE-2020-scorer).
https://hipe-eval.github.io
MIT License
13 stars 4 forks source link

There are no tags in the system response for the column... #8

Closed creat89 closed 4 years ago

creat89 commented 4 years ago

Hello,

Currently I'm having an issue doing some internal evaluations when my system only predicts a type of NER. The scorer stops when I do not provide labels for all the types of columns. In my opinion, if the user do not provide labels for a specific column it should return zero in the evaluation of that column rather than stopping it.

aflueckiger commented 4 years ago

Hello @creat89,

Thanks for your report. Yet, I cannot reproduce the error. What do you mean by "stopping"? Do you get an error? The following test seems to work as expected:

I replaced the NE-FINE-LIT with X with:

awk -vOFS='\t'  '{$4 = "X"; print}' data/release/v1.2/de/HIPE-data-v1.2-test-de.tsv > issue_8.tsv

Then I run the scorer with:

python ../CLEF-HIPE-2020-scorer/clef_evaluation.py --ref data/release/v1.2/de/HIPE-data-v1.2-test-de.tsv --pred issue_8.tsv --task nerc_fine --outdir data/system-evaluations --log issue_8.log

The scorer correctly complains about the missing column in this case:

The provided annotation columns ['NE-FINE-LIT'] are not available in both the gold standard and the system response 'issue_8.tsv'.

However, it runs through when you just provide an empty column with all the required fieldnames (e.g. 'NE-FINE-LIT').

Please provide more information if you think something is wrong.

creat89 commented 4 years ago

For instance, I have the following file:

TOKEN   NE-COARSE-LIT   NE-COARSE-METO  NE-FINE-LIT NE-FINE-METO    NE-FINE-COMP    NE-NESTED   NEL-LIT NEL-METO    MISC
# language = de
# newspaper = NZZ
# date = 1798-01-17
# document_id = NZZ-1798-01-17-a-p0002
# segment_iiif_link = _
Rußland B-loc   O   O   O   O   O   _   _   _
.   O   O   O   O   O   O   _   _   _
Petersburg  B-loc   O   O   O   O   O   _   _   _

where my system only predicted labels for the column NE-COARSE-LIT but not for NE-COARSE-METO. In other words, for NE-COARSE-METO I just printed O. If I run the script:

python clef_evaluation.py --ref /home/HIPE-data-v1.0-dev-de.tsv --pred /home/Predictions_dev.tsv --skip_check --task nerc_coarse

The script tells me:

There are no tags in the system response file '/home/adrian/Programs/NER_BERT_News/Server_Models3/NER_models_europeana_german_fixed_nofixtags_weightedlosslogAlpha/Impresso_de_v1,0_8_5e-05//Predictions_test.tsv' for the column: ['NE-COARSE-METO']

And does not produce any output. However, in this case, it should indicate that for NE-COARSE-METO the score is zero rather than stopping the script.

However, it runs through when you just provide an empty column with all the required fieldnames (e.g. 'NE-FINE-LIT').

Thus, that this means that instead of having O in the second column I just need to put it empty?

aflueckiger commented 4 years ago

Thanks for clarifying. With https://github.com/impresso/CLEF-HIPE-2020-scorer/commit/b8bdbf83eb7797963da740c41d13459680796419, the behavior is changed. As of now, the scorer logs missing tags only without quitting the evaluation.